...

Copy-Replication

Understand the models through which data is replicated across several nodes.

With the increase in data volumes, it becomes difficult and unviable to scale up database servers – buy a bigger server to run the database on. A suitable strategy is to distribute the data among several servers. With this strategy, aggregate orientation fits well because it is a natural unit used for distribution.

Various distribution models are based on handling a larger quantity of data, high throughput, and availability during planned and unplanned events. Along with these benefits, distributing data across multiple servers brings complexity incurring costs to the system.

Mainly, there are two techniques for data distribution: Replication and Partitioning. Both approaches are orthogonal to each other in the sense that replication copies data across multiple servers while partitioning puts different data on different servers. One can use either or both of them.

Replication

Maintaining different copies of the same data on multiple machines on a network is known as replication.

However, with many benefits like availability, replication comes with its complexities. If your data does not require changes once it’s replicated, replication is pretty straightforward. You just have to replicate the data to all the nodes. The main problem in replication arises when we have to maintain changes in the replicated data over time. On every update, these replicas must be maintained and kept in sync with one another. There are many algorithms related to the replication of changes between nodes but we will just discuss three popular algorithms: Single leader (primary-secondary), Multi leader, and leaderless(peer-peer) replication.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Copy-Replication

Replication

Single leader / Primary-secondary replication