Data Model of Bigtable

Learn about the data model of Bigtable.

A data model and an associated API are the cornerstones of any database. In this lesson, we will learn how Bigtable uses key-value stores to provide an abstraction of a table and associated table and data manipulation operations.

Bigtable is a sparseIt does not need to store an entry in every cell (row/column intersection)., distributedThe table will be on many physical servers via sharding., persistentData we store will persist even after our session ends. multi-dimensional sorted mapA map essentially means a key/value store. The multidimensional sorted map is like a sorted map of maps.. In traditional databases, we have two-dimensional layouts. Each cell is determined by a row ID and column name. On the other hand, Bigtable has the following four dimensions.

  1. Row key: It uniquely determines the row.
  2. Column family: This depicts a group of columns.
  3. Column name: It uniquely determines the column.
  4. Timestamp: The columns can have different versions of a value uniquely determined by timestamps.

A row key, column key, and timestamp are used to index the map. All map’s values are uninterpreted arrays of bytes. Bigtable treats all data as raw byte strings.

(row: string, column: string, time: int64) → string

Google designed Bigtable to store large amounts of data efficiently. For instance, if we are storing the web page, then the row key will be a URL e.g., educative.io. The column key says that we’re going to store the contents of the webpage at that URL. The timestamp is just possibly the time at which we crawled the web and fetched that webpage. The value is the content of the web page. Now, this is a three-dimensional table. If we’ve crawled the web at previous times, we’ll just leave the older versions in the table.

Rows

All rows in Bigtable have an associated row key, an arbitrary string up to 64 KB in size (most of the keys are much smaller than 64 KB). Each write and read of the data in a distinct row key is atomicAn atomic operation is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs. (Source: Wikipedia) regardless of the ...