System Design Deep Dive: Real-World Distributed Systems/

...

Data Model of Bigtable

Learn about the data model of Bigtable.

We'll cover the following...

Rows
Columns
Column families
Timestamps
High-level design
- API design

Bigtable is a sparseIt does not need to store an entry in every cell (row/column intersection)., distributedThe table will be on many physical servers via sharding., persistentData we store will persist even after our session ends. multi-dimensional sorted mapA map essentially means a key/value store. The multidimensional sorted map is like a sorted map of maps.. In traditional databases, we have two-dimensional layouts. Each cell is determined by a row ID and column name. On the other hand, Bigtable has the following four dimensions.

Row key: It uniquely determines the row.
Column family: This depicts a group of columns.
Column name: It uniquely determines the column.
Timestamp: The columns can have different versions of a value uniquely determined by timestamps.

A row key, column key, and timestamp are used to index the map. All map’s values are uninterpreted arrays of bytes. Bigtable treats all data as raw byte strings.

(row: string, column: string, time: int64) → string

Google designed Bigtable to store large amounts of data efficiently. For instance, if we are storing the web page, then the row key will be a URL e.g., educative.io. The column key says that we’re going to store the contents of the webpage at that URL. The timestamp is just possibly the time at which we crawled the web and fetched that webpage. The value is the content of the web page. Now, this is a three-dimensional table. If we’ve crawled the web at previous times, we’ll just leave the older versions in the table.

Rows

All rows in Bigtable have an associated row key, an arbitrary string up to 64 KB in size (most of the keys are much smaller than 64 KB). Each write and read of the data in a distinct row key is atomicAn atomic operation is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs. (Source: Wikipedia) regardless of the ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Data Model of Bigtable

Rows