High Availability

This lesson explains how high availability is implemented for HDFS.

We'll cover the following

High Availability
Split brain scenario
Fencing
Shared NFS directory

High Availability

High availability is characteristic of a distributed system. It is defined as the ability of a system or system component to be continuously operational for a long period of time. For example, Amazon’s ubiquitous S3 storage boasts a 99.99% availability over a given year.

To achieve high availability for HDFS, we need more than one instance of the Namenode to avoid downtime and failures during software/hardware upgrades . In HA setup, one Namenode serves client queries and is known as the Active Namenode. The rest are known as standby Namenodes. If the active Namenode experiences a failure, a standby Namenodes takes over.

Working

Imagine a cluster with two Namenodes. In order for the standby Namenode to successfully take over incase of failure, it exactly imitates the actions taken by the active Namenode on its namespace state. This is done by deploying JournalNodes. Like a journal, the JournalNodes keep a record of all the changes the active Namenode makes on its namespace. Because this is a distributed system, the changes are recorded to a majority of the JournalNodes. We need more than one JournalNode to record Namenode’s activities because JournalNodes themselves are prone to failure.

How do we define the majority in case of an even number of JournalNodes? We don’t! We must run an odd number of JournalNodes starting with at least three. With three nodes, the active Namenode writes the modifications it makes to its namespace to two out of the three JournalNodes. This allows for tolerating a single machine failure. The following formula dictates the number of failures a JournalNode will tolerate:

$\frac{N-1}{2}$

where N is the number of JournalNodes in HA setup. Plug in various numbers and see the number of JournalNodes that can go down before rendering the system unusable.

The illustration below shows a typical HDFS set-up with JournalNodes. There’s also Zookeeper (discussed later) and Failover Controller abbreviated as ZK and FC respectively in the illustration. The FC is a Zookeeper related component that periodically pings the Namenode for health-checks and reports any failures back to Zookeeper.

Get hands-on with 1200+ tech skills courses.

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

High Availability

High Availability