Amazon S3 Service Disruption

Learn about the Amazon S3 service disruption and the possible failure mitigation techniques.

Introduction

Amazon Simple Storage Service (S3) is one of the services AWS offers. S3 is a highly secure, scalable, and durable object storage service that provides data storage and retrieval from anywhere.

On February 28, 2017, S3 started to fail in the Northern Virginia (US-EAST-1) region due to a human error. This service disruption lasted several hoursSource: https://aws.amazon.com/message/41926/, affecting many of its customers, including Slack, Netflix, and Reddit.

In this lesson, we discuss the root cause of the S3 failure and how to mitigate such failures.

How did it happen?

The root cause of the S3 outage was a human error made during a routine debugging process. Let's look at how this happened:

  1. An Amazon S3 team member attempted to troubleshoot an issue with the billing system. The intention was to remove one of the S3 subsystems used by the billing system.

  2. An incorrect command was entered, which caused a significant removal of S3 servers.

  3. The removal of the S3 subsystem caused two other dependent S3 subsystems, the index and placement subsystems, to fall. The index subsystem manages the S3 objects' metadata and location information. This subsystem also serves GET, LIST, PUT, and DELETE requests. The placement subsystem is responsible for allocating storage for new objects. As a result of the removal of the S3 system, these two subsystems lost a notable amount of their capacity and required a full restart.

  4. The outage affected not only the S3 service itself but also other AWS services, such as Amazon Elastic Compute Cloud (EC2), AWS Lambda, and Amazon CloudFront, which rely on S3 for data storage and retrieval. The impact of the outage varied among customers and applications, but many experienced slow performance, errors, or complete downtime.

  5. The AWS Service Health Dashboard (SHD) also failed due to the SHD administration console's dependency on Amazon S3.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.