Antipatterns

Learn the start of software engineering, system users increasing from hundreds to millions, and how complexity and tight coupling is prone to failures.

Software crisis

Delegates to the first NATO Software Engineering Conference coined the term “software crisis” in 1968. They meant that demand for new software outstripped the capacity of all existing programmers worldwide. If that truly was the start of the software crisis, then it has never ended!

Our machines have improved by orders of magnitude, and so have the languages and libraries. The enormous leverage of open source multiplies our abilities. And of course, something like a million times more programmers are in the world now than there were in 1968.

So overall, our ability to create software has had its own kind of Moore’s law exponential curve at work. So why are we still in a software crisis? Because we’ve steadily taken on bigger and bigger challenges.

Increasing number of users

In those hazy days of the client/server system, we used to think of a hundred active users as a large system, but now we think about millions. That’s up from the first edition of this course, when ten thousand active users was a lot.

We’ve just seen our first billion-user site. In 2016, Facebook announced that it has 1.13 billion daily active users. An “application” now consists of dozens or hundreds of services, each running and getting redeployed continuously. Five-nines (99.999%) of reliability for the overall application is nowhere near enough because it would result in thousands of disappointed users every day. Six Sigma quality on Facebook would create 768,000 angry users per day (200 requests per page, 1.13 billion daily active users, 3.4 defects per million opportunities).

Breadth of application

The breadth of our applications’s reach has exploded, too. Everything within the enterprise is interconnected, and then again as we integrate across enterprises. Even the boundaries of our applications have become fuzzy as more features are delegated to SaaS services.

Technology frontier

Of course, this also means bigger challenges. As we integrate the world, tightly coupled systems are the rule rather than the exception. Big systems serve more users by commanding more resources, but in many failure modes big systems fail faster than small systems. The size and the complexity of these systems push us to what author James R. Chiles calls in Inviting Disaster [Chi01] the “technology frontier,” where the twin specters of high interactive complexity and tight coupling conspire to turn rapidly moving cracks into full-blown failures.

High interactive complexity arises when systems have enough moving parts and hidden, internal dependencies that most operators’ mental models are either incomplete or just plain wrong.

In a system exhibiting high interactive complexity, the operator’s instinctive actions will have results ranging from ineffective to actively harmful. With the best of intentions, the operator can take an action based on thier own mental model of how the system functions that triggers a completely unexpected linkage.

Problem inflation

Such linkages contribute to “problem inflation,” turning a minor fault into a major failure. For example, hidden linkages in cooling monitoring and control systems are partly to blame for the Three Mile Island reactor incident, as Chiles outlines in his book. These hidden linkages often appear obvious during the postmortem analysis, but are in fact devilishly difficult to anticipate.

Tight coupling

Tight coupling allows cracks in one part of the system to propagate or multiply across layer or system boundaries. A failure in one component causes load to be redistributed to its peers and introduces delays and stress to its callers. This increased stress makes it extremely likely that another component in the system will fail. That in turn makes the next failure more likely, eventually resulting in total collapse. In our systems, tight coupling can appear within application code, in calls between systems, or any place a resource has multiple consumers.

In the next chapter, we’ll look at some patterns that can alleviate or prevent the antipatterns from harming our system. Before we can get to that good news, though, we need to understand what we’re up against.

What to avoid

In this chapter, we’ll look at antipatterns that can break our system. These are common forces that have contributed to more than one system failure. Each of these antipatterns will create, accelerate, or multiply cracks in the system. These bad behaviors are to be avoided.

Simply avoiding these antipatterns isn’t sufficient, though. Everything breaks. Faults are unavoidable. We can’t pretend you can eliminate every possible source of them, because either nature or nurture will create bigger disasters to wreck our systems. Remember, faults will happen; we just need to examine what happens after the fault creeps in.