Remote call susceptible failures

Consider building a test harness that substitutes for the remote end of every web service call. Because the remote call uses the network, the socket connection is susceptible to the following failures:

  • It can be refused.

  • It can sit in a listen queue until the caller times out.

  • The remote end can reply with a SYN/ACK and then never send any data.

  • The remote end can send nothing but RESET packets.

  • The remote end can report a full receive window and never drain the data.

  • The connection can be established, but the remote end never sends a byte of data.

  • The connection can be established, but packets could be lost, causing retransmit delays.

  • The connection can be established, but the remote end never acknowledges receiving a packet, causing endless retransmits.

  • The service can accept a request, send response headers (supposing HTTP), and never send the response body.

  • The service can send one byte of the response every thirty seconds.

  • The service can send a response of HTML instead of the expected XML.

  • The service can send megabytes when kilobytes are expected.

  • The service can refuse all authentication credentials.

These failures fall into distinct categories:

  • Network transport problems

  • Network protocol problems

  • Application protocol problems

  • Application logic problems

Failure modes in OSI model

With a little mental exercise, we can find failure modes in every layer of the seven-layer OSI model. It would be costly and bizarre to add switches and flags to applications that would allow them to simulate all of these failures. Who would want to risk turning on a simulated failure once the system is promoted into production? Integration testing environments are good at examining failures only in the seventh layer, the application layer, and not even all of those.

Test harness scope

A test harness “knows” that it’s meant for testing. It has no other role to play. Although the real application wouldn’t be written to call the low-level network APIs directly, the test harness can be. Therefore, it’s able to send bytes too quickly, or very slowly. It can set up extremely deep listening queues. It can bind to a socket and then never service a single connection attempt. The test harness should act like a little hacker, trying all kinds of bad behavior to break callers.

Bad network behavior

Many kinds of bad behavior will be similar for different applications and protocols. For example, refusing connections, connecting slowly, and accepting requests without reply would apply to any socket protocol: HTTP, RMI, or RPC. For these, a single test harness can simulate many types of bad network behavior. One trick is to have different port numbers indicate different kinds of misbehavior. On port 10200, it would accept connections but never reply. Then, port 10201 gets a connection and a reply, but the reply will be copied from /dev/random. Finally, port 10202 will open a connection, then drop it immediately, and so on. That way, we don’t need to change modes on the test harness and a single test harness can break many applications. It can even help with functional testing in the development environment by letting multiple developers hit the test harness from their workstations. Of course, it’s also worthwhile to let the developers run their own instances of the killer test harness.

Bear in mind that the test harness might be really, really good at breaking, even killing applications. It’s not a bad idea to have the test harness log requests, in case the application dies without so much as a whimper to indicate what killed it. A test harness that injects faults will unearth many hidden dependencies. Injecting latency in requests will uncover many more. Reordering TCP packets will uncover more again. The only limit is imagination. The test harness can be designed like an application server. It can have pluggable behavior for the tests that are related to the real application. A single framework for the test harness can be subclassed to implement any application-level protocol, or any perversion of the application-level protocol, necessary. Broadly speaking, a test harness leads toward “chaos engineering,”

Tips to remember

Emulate out-of-spec failures

Calling real applications lets us test only those errors that the real application can deliberately produce. A good test harness lets us simulate all sorts of messy, real-world failure modes.

Get hands-on with 1200+ tech skills courses.