Facebook: Optimized Datacenter Resource Allowance System
We'll cover the following
- Introduction
- Challenges in providing guaranteed capacity
- Prior solutions
- RAS solution by Facebook
- Resource management realities
- Region layout
- Hardware heterogeneity
- Impact of hardware heterogeneity on services
- Diverse capacity requests
- Server unavailability events
- RAS design
- RAS evaluation
Introduction
Cluster managers run on a set of nodes and manage a cluster. It works with cluster agents who handle the complete cluster, including placing and managing containers or virtual machines on servers. The challenging task for cluster managers is to efficiently allocate resources in data centers understudy in the past decades.
Public clouds have acquired various techniques, including open-source systems such as Kubernetes and proprietary systems such as Google’s Borg, Facebook’s Twin, and Microsoft’s Protean.
The capacity reservation allows us to reserve computing instances in advance so that they can be used during critical events such as unscheduled maintenance, disaster recovery, or unusual workload incorporation.
In recent approaches, the problem is that there is a lack of knowledge on how to provide guaranteed capacity despite large-scale failures in data centers.
In this lesson, we describe how Facebook solved this problem for their on-premise infrastructure.
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy