Introduction to Megastore

Motivation

As desktop programs migrate to the cloud, interactive online services challenge the storage market to fulfill new needs. E-mails, shared reports, and social networking are developing at an enormous speed, pushing the limits of existing infrastructure. Handling the storage needs of these services is difficult due to the following demands:

  1. Applications must be extremely scalable due to the vast audience of potential consumers that the Internet brings. Using traditional databases such as MySQL since the datastore allows a service to be quickly developed, but expanding the service to millions of people demands a total overhaul of the storage infrastructure.
  2. Organizations have to compete for users. This demands speedy product development and time-to-market. Usually, a NoSQL solution with some custom consistency models has its learning curve, and developers might be forcing the solution to the problem that is more amenable to traditional relational databases.
  3. Low latency is essential for storage systems.
  4. The application should give the user a consistent data view. The outcome of a change should be seen instantly and indefinitely.
  5. The services should be highly available. The system should operate uninterrupted despite server or component failures.

Technology options

The demands above have trade-offs. Relational databases offer comprehensive capabilities for easily implementing applications, but scaling to hundreds of millions of people is tough. Although NoSQL datastores such as Google’s Bigtable are very scalable, their restricted API and weak consistency models make application development more difficult. Transactions in Bigtable are possible at individual keys. For transactions across many keys, applications would need to explicitly use different mechanisms. Doing so makes the code complicated to write and manage. It is difficult to replicate data across distant data centers while maintaining low latency. It is even more difficult to ensure a consistent view of replicated data, particularly during breakdowns.

Hence, finding a globally scalable system that allows ACID (atomicity, consistency, isolation, and durability) transactions is hard.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.