Well Architected Framework: Reliability

Get introduced to the reliability design principles.

The reliability pillar covers the ability of a system to recover from service or infrastructure outages/disruptions as well as the ability to dynamically acquire computing resources to meet demand.

E.g. use of chaos monkey to test recovery procedure.

Design principles

  1. Test recovery procedures
  2. Automatically recover from failure
  3. Scale horizontally to increase aggregate system availability
  4. Stop estimating/guesting capacity. E.g. No under-provisioning or over-provisioning.

Reliability on the cloud

Reliability on the cloud consists of three areas:

  1. Foundations
  2. Change Management
  3. Failure Management

Get hands-on with 1200+ tech skills courses.