High Availability

Explore the concept of high availability in the Hadoop Distributed File System. Understand how multiple Namenodes, JournalNodes, and fencing techniques work together to prevent downtime, handle failures, and maintain uninterrupted access to your Big Data storage.

We'll cover the following...

High Availability
Split brain scenario
Fencing
Shared NFS directory

High Availability

High availability is characteristic of a distributed system. It is defined as the ability of a system or system component to be continuously operational for a long period of time. For example, Amazon’s ubiquitous S3 storage boasts a 99.99% availability over a given year.

To achieve high availability for HDFS, we need more than one instance of the Namenode to avoid downtime and failures during software/hardware upgrades . In HA setup, one Namenode serves client queries and is known as the Active Namenode. The rest are known as standby Namenodes. If the active Namenode experiences a failure, a standby Namenodes takes over.

Working

Imagine a cluster with two ...

1.Hadoop

2.YARN

3.Map Reduce

4.HDFS

5.Spark

6.Input & Output Formats

7.Misc

8.Quiz

9.Reference: Replication

10.Reference: Partitioning

11.Reference: Transactions

12.Reference: Issues in Distributed Systems

Mock Interview

High Availability

High Availability