Characteristics of Distributed Systems

This lesson discusses the various cornerstones of distributed systems: reliability, scalability, availability, consistency, and maintainability.

Whenever we deal with distributed systems, there are five attributes particular to such systems that we must be aware of and understand.

Reliability

Reliability is a system’s ability to continue to work correctly in spite of faults. A distributed system is usually made of several smaller sub-components that together work to deliver a service. A reliable system can be banked upon to continue to work without degradation of service if a part of the overall system fails. The reliability concept can be extended to include a system’s ability to continue to perform with the expected functionality, tolerate human errors and unexpected use of the system, maintain performance under high data volume load, and, prevent any unauthorized use or abuse of the system when failures do happen.

Fault vs. failure

The terms “fault” and “failure” are often used interchangeably but mean different things. When a part of a system experiences failure but the system as a whole can still continue to operate reliably, we label the system as fault-tolerant or resilient. The system can tolerate components deviating from the spec but still function correctly. A failure occurs when the system as a whole fails. No system can be made fault-tolerant from all types of possible faults and can always potentially fail as a whole.

Netflix’s Chaos Monkey is an example of a tool designed to test the resiliency of services in the Netflix ecosystem. The tool randomly terminates service instances to uncover service failures.

Scalability

A system is said to be scalable if it can continue to work correctly as the load on the system ...