Search⌘ K

Case Study: The Exception That Grounded an Airline

Explore the case study of an airline system outage caused by a small programming error that escalated into a major disruption. Understand how core facility architecture with redundancy and failover mechanisms is designed to maintain high availability and what lessons can be drawn to prevent future failures.

Scale of the problem

Have you ever noticed that the incidents that blow up into the biggest issues start with something minimal? A tiny programming error starts the snowball rolling downhill. As it gains momentum, the scale of the problem keeps getting bigger and bigger. A major airline experienced just such an incident. It eventually stranded thousands of passengers and cost the company hundreds of thousands of dollars. Here’s how it happened.

The airline incident

As always, all names, places, and dates have been changed to protect the confidentiality of the people and companies involved. This incident started with a planned failover on ...