Trusted answers to developer questions

How to build highly available/fault-tolerant services in node.js

Get the Learn to Code Starter Pack

Break into tech with the logic & computer science skills you’d learn in a bootcamp or university — at a fraction of the cost. Educative's hand-on curriculum is perfect for new learners hoping to launch a career.

During my job for an important client, I began thinking about high availability and recovery NFRs – our tech stack included Cassandra and Kafka, two distributed systems whose internal behavior I studied.

Kafka uses Zookeeper to keep track of assigned partitions for each consumer; Cassandra has a gossip algorithm between nodes and divides data in partition ranges.

So, I was starting to think if there was any library (not an external service like zookeeper) that had an algorithm with gossip implemented so that people could build new distributed systems more easily.

That library does not exist, so I created ring-election .

You can integrate ring-election into your node process, and you will have some important pre-constructed NFRs!!!

What the ring-election driver offers you:

A default partitioner for an object that returns the partition to which it is assigned.
Mechanism of leader election.
Failure detection between nodes.
Assignment and rebalancing of partitions between nodes.
Automatic re-election of the leader.
Listening for new assigned/revoked partitions.

What problems can you solve with this driver?

Scalability
High Availability
Concurrency between nodes in a cluster
Automatic Failover
Gossip between nodes

How it works under the hood

Terminology:

Leader – the node that will handle the cluster and has assigned partitions.
Follower – a node that will have assigned partitions and will work on them.
Heartbeat – a message sent periodically from the follower nodes to the leader node to make sure that the follower is alive.
Heartcheck – a process that runs on the leader and checks the last heartbeat received by each follower.
Priority – is assigned to each follower based on the time that they joined the cluster. When a node dies, the priority is decreased by one. If the leader dies, the node with a lower priority will become the leader.
Node id – each follower node has an assigned id that is unique to the cluster.