Let It Crash

Learn about system crashes, avoiding the whole system being down for some error, limited granularity, fast replacement, when to restart the whole system after failure, and reintegration.

Handling errors

Sometimes the best thing to do to create system-level stability is to abandon component-level stability. In the Erlang world, this is called the “let it crash” philosophy. We know from Case Study: The Exception That Grounded an Airline, that there is no hope of preventing every possible error. Dimensions proliferate and the state space exponentiates. There’s just no way to test everything or predict all the ways a system can break. We must assume that errors will happen. The key question is, “What do we do with the error?” Most of the time, we try to recover from it. That means getting the system back into a known good state using things like exception handlers to fix the execution stack and try-finally blocks or block-scoped resources to clean up memory leaks.

Get hands-on with 1200+ tech skills courses.