With the increase in data volumes, it becomes difficult and unviable to scale up database servers – buy a bigger server to run the database on. A suitable strategy is to distribute the data among several servers. With this strategy, aggregate orientation fits well because it is a natural unit used for distribution.

Various distribution models are based on handling a larger quantity of data, high throughput, and availability during planned and unplanned events. Along with these benefits, distributing data across multiple servers brings complexity incurring costs to the system.

Mainly, there are two techniques for data distribution: Replication and Partitioning. Both approaches are orthogonal to each other in the sense that replication copies data across multiple servers while partitioning puts different data on different servers. One can use either or both of them.

Replication

Maintaining different copies of the same data on multiple machines on a network is known as replication.

However, with many benefits like availability, replication comes with its complexities. If your data does not require changes once it’s replicated, replication is pretty straightforward. You just have to replicate the data to all the nodes. The main problem in replication arises when we have to maintain changes in the replicated data over time. On every update, these replicas must be maintained and kept in sync with one another. There are many algorithms related to the replication of changes between nodes but we will just discuss three popular algorithms: Single leader (primary-secondary), Multi leader, and leaderless(peer-peer) replication.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy