Search⌘ K
AI Features

Detailed Design of a Distributed Cache

Discover how to refine a distributed cache design by eliminating single points of failure and improving availability. Implement a configuration service for server discovery and use primary-replica sharding to ensure data consistency. Learn the internal workings of cache servers, including hash maps and LRU eviction policies.

We'll cover the following...

  • <a href=”#Find-and-remove-limitations" aria-label=“Read more about Find and remove limitations” >Find and remove limitations
    • <a href="#Maintain-the-cache-servers-list" aria-label=“Read more about Maintain the cache servers list” >Maintain the cache servers list
    • <a href="#Improve-availability" aria-label=“Read more about Improve availability” >Improve availability
    • <a href="#Internals-of-the-cache-server" aria-label=“Read more about Internals of the cache server” >Internals of the cache server
  • <a href="#Detailed-design" aria-label=“Read more about Detailed design” >Detailed design
"

This lesson identifies limitations in the high-level design and refines the architecture to address them.

Find and remove limitations

Before we get to the detailed design, we must resolve three specific challenges:

  • Service discovery: Cache clients have no mechanism to detect when cache servers are added or fail.

  • SPOF and performance: Using a single server for a dataset creates a Single Point of Failure (SPOF). Additionally, frequently accessed data (hotkeys) can overload a single node, degrading performance.

  • Server internals: The design lacks details regarding internal data structures and eviction policies.

Maintain the cache servers list

We will address the service discovery problem first. The following slides ...