Case Study: The Exception That Grounded an Airline

Learn about the airline incident, its core facilities, and the power and architecture of its core facilities.

Scale of the problem

Have you ever noticed that the incidents that blow up into the biggest issues start with something minimal? A tiny programming error starts the snowball rolling downhill. As it gains momentum, the scale of the problem keeps getting bigger and bigger. A major airline experienced just such an incident. It eventually stranded thousands of passengers and cost the company hundreds of thousands of dollars. Here’s how it happened.

The airline incident

As always, all names, places, and dates have been changed to protect the confidentiality of the people and companies involved. This incident started with a planned failover on the database cluster that served the core facilities (CF). The airline was moving toward a service-oriented architecture, with the usual goals of:

  • Increasing reuse
  • Decreasing development time
  • Decreasing operational costs

At this time, CF was in its first generation. The CF team planned a phased rollout, driven by features. It was a sound plan, and it probably sounds familiar because most large companies have some variation of this project underway now.

Get hands-on with 1200+ tech skills courses.