We have studied how we can scale the system. We also learned about different concepts like partitioning, caching, load balancing, and the platform layer. Our system might be scalable but not redundant yet.

As a next step, we should identify all the bottlenecks of our existing system.

Identifying and resolving bottlenecks

In this section, we should discuss as many bottlenecks as possible and different approaches to mitigate them. Our issues can vary from redundancy to a single point of failure or how to have reliable systems.

Let us cover a few topics that will add reliability and redundancy to our system.

There are two ways to increase our availability and add redundancy, while also making our system more reliable as well.

Failover

Failover is like an “active, passive” approach. It means that when our primary component goes down, our stand-by hardware becomes active to fill the gap.

For this, we need an extra pair of hardware resources that may remain passive most of the time. Also, it would be best if we had an automatic failover to avoid any downtime. In short, failover is not as simple as it sounds. It brings a lot of complexity and unused resources.

Replication

We may need a replica of almost everything, so we do not have a single point of failure. There are various benefits of replication.

  • Low latency because we can keep the data closer to the user by replication.
  • High availability of the machines because if one is not working, we will have a replica.
  • High throughput because we can get the data from multiple copies.

The challenge with replication is to maintain synchronization between multiple copies. There are different techniques to have a replica.

Leader follower replication

In leader-follower architecture, requests go to the leader first, and the then-leader makes the changes to its followers. The leader saves all the changes in the log called a replication log. It makes sure that changes are consistent in case of failure.

Whenever any request comes to read the data, it gets served by one of the followers. All followers have a copy of the data, but only leaders manage writes.

This architecture is highly available because if any follower dies, we can still serve other followers.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.