Scaling Database

Sharding data

Data is an asset for any organization. The increased popularity of the platform attracts more users. Increasing data and concurrent read/write traffic to the data put scalability pressure on traditional databases. Traditional databases are attractive due to nice properties like ACID and transactions.

Over the decades it has proved very hard to extend those nice programming features but in a distributed way, where data might be partitioned as needed and still provide ACID-like properties for transactions.

One solution is to move data to a NoSQL-like system. Though the amount of historical codebase and its close cohesion with traditional databases makes it an expensive problem to tackle.

Organizations might like to scale traditional databases by using a third-party solution. But often integrating a third-party solution has its own complexities. More importantly, there are abundant opportunities to optimize for the specific problem at hand and to get much better performance as compared to a general-purpose solution.

There are third-party, general-purpose solutions available for scaling traditional databases. Vitess is one such solution that provides horizontal scaling for the MySQL database.

For using multiple machines (to divide load), we need to partition the data. Generally, we use the following ways to share the data.

  • Vertical sharding
  • Horizontal sharding

Vertical sharding

We can put different tables in different database instances (that might be running on a different physical server). We might break a table into multiple tables such that some columns are in one table, while the rest are in the other. Often care should be taken if there are joins between multiple tables. We might like to keep such tables together on one partition.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy