Detailed Design of a Distributed Cache
Discover how to refine a distributed cache design by eliminating single points of failure and improving availability. Implement a configuration service for server discovery and use primary-replica sharding to ensure data consistency. Learn the internal workings of cache servers, including hash maps and LRU eviction policies.
We'll cover the following...
“
- <a href=”#Find-and-remove-limitations" aria-label=“Read more about Find and remove limitations” >Find and remove limitations
- <a href="#Maintain-the-cache-servers-list" aria-label=“Read more about Maintain the cache servers list” >Maintain the cache servers list
- <a href="#Improve-availability" aria-label=“Read more about Improve availability” >Improve availability
- <a href="#Internals-of-the-cache-server" aria-label=“Read more about Internals of the cache server” >Internals of the cache server
- <a href="#Detailed-design" aria-label=“Read more about Detailed design” >Detailed design
This lesson identifies limitations in the high-level design and refines the architecture to address them.
Find and remove limitations
Before we get to the detailed design, we must resolve three specific challenges:
Service discovery: Cache clients have no mechanism to detect when cache servers are added or fail.
SPOF and performance: Using a single server for a dataset creates a Single Point of Failure (SPOF). Additionally, frequently accessed data (hotkeys) can overload a single node, degrading performance.
Server internals: The design lacks details regarding internal data structures and eviction policies.
Maintain the cache servers list
We will address the service discovery problem first. The following slides ...