Trusted answers to developer questions

What is database sharding?

Get Started With Machine Learning

Learn the fundamentals of Machine Learning with this free course. Future-proof your career by adding ML skills to your toolkit — or prepare to land a job in AI or Data Science.


Database sharding is the process of making partitions of data in a database or search engine, such that the data is divided into various smaller distinct chunks, or shards.

Each shard could be a table, a Postgres schema, or a different physical database held on a separate database server instance.

svg viewer

Some data within the database remains present in all shards (vertical sharding), but some appear only in single shards (horizontal sharding). The following figure illustrates vertical sharding and horizontal sharding.

To shard your data, you need to decide a key, called a sharding key, to partition your data on. The shard key is either an indexed field or indexed compound fields that exist in every document in the collection.

There is no general rule to select a sharding key; what key you choose depends on your application. For instance, you may choose userID as the shard key in a social media app.


<img src="/api/edpresso/shot/4827308338708480/image/5741031244955648" alt=“Markdown Monster icon”

width=“230” />

Sharding allows your application to make fewer queries. When it receives a request, the application knows where to route the request and thus it has to look through less data, rather than going through the whole database.

It improves the performance of your application, and lets you rest easier, not having to worry about scalability issues.

RELATED TAGS

system design
database sharding
database systems
database performance
Copyright ©2024 Educative, Inc. All rights reserved
Did you find this helpful?