Search⌘ K
AI Features

The Outage

Explore how to manage a major airline system outage by prioritizing service restoration and diagnosing critical dependencies. Understand the importance of targeted interventions like restarting specific application servers to recover check-in kiosks and IVR systems swiftly, maintaining uptime during high-demand periods.

Services stopped

At about 2:30 a.m., all the check-in kiosks went red on the monitoring console. Every single one, everywhere in the country, stopped servicing requests at the same time.

Red signals

A few minutes later, the IVR servers went red too. Not exactly panic time, but pretty close, because 2:30 a.m. Pacific time is 5:30 a.m. Eastern time, which is prime time for commuter flight check-in on the Eastern seaboard. The operations center immediately opened a Severity 1 case and got the local team on a conference call.

Restore services

In any incident, the first priority is always to restore service. Restoring service takes precedence over investigation. If we can collect some data ...