Demand Control

Learn about how a system can crash, socket limitation under heavy requests, ethernet, and long responses leading users to retry.

System crash

In the old days of mainframes in glass houses, we could predict what the workload looked like from day to day. Operators would measure how many MIPS (millions of instructions per second) a given job needed. Those days are long gone. Most of our services are either directly or indirectly exposed to the entire world’s population.

Our daily reality is this: the world can crush our systems at any time. There’s no natural protection. We have to build it. There are two basic strategies: either refuse work or scale out. For the moment, we’ll consider when, where, and how to refuse work.

How systems fail

Every failing system starts with a queue backing up somewhere. When thinking about request/reply workload, we need to consider the resources being consumed and the queues to get access to those resources. That’ll let us decide where to cut off new requests. Each request obviously consumes a socket on each tier it passes through. While the request is active on an instance, that instance has one fewer ephemeral socket available for new requests. In fact, that socket is consumed for a little while after the request completes. (See TIME_WAIT and the Bogons).

Sockets limitation

There’s a relationship between the number of sockets available and the number of requests per second our service can handle. That relationship depends on the duration of the requests. (They are related via “Little’s law” 5^{5}). The faster oour service retires requests, the more throughput it can handle. But we’re talking about systems under high levels of load. It’s natural to expect our service to slow down under heavy load, but that means fewer and fewer sockets are available to receive requests exactly when the most requests are coming in. We call that “going nonlinear,” and we don’t mean it in a good way.

Get hands-on with 1200+ tech skills courses.