Appends and Read Operations in HBase

Let's inspect how appends are more efficient than random writes and how we can optimize the inefficiency of read operations.

Appends

Appends are more efficient than random writes, especially in a filesystem like HDFS. Region servers try to take advantage of this fact by employing the following components for storage and data retrieval.

MemStore

MemStore is used as a write cache. Writes are initially written in this data structure, which is stored in-memory and can be sorted efficiently before being written to disk. Writes are buffered in this data structure and periodically written to HDFS after being sorted.

HFile

This is the file in HDFS that stores sorted key-value entries on disk.

Write ahead log (WAL)

It stores operations that are not persisted to permanent storage and are only stored in the MemStore. WAL is also stored in HDFS and is used for recovery in the case of a region server failure.

BlockCache

BlockCache is the read-cache that stores frequently read data in memory and evicts the least recently used data when the cache is full.

Get hands-on with 1200+ tech skills courses.