It’s prudent to start with a clear definition of observability. It means to be able to notice or discern something. Applied to our focus on software applications, we can be more specific about what we discern and how we do so: to be able to measure the internal state of a system by its outputs.
In software systems, this is achieved through the enablement of what is commonly referred to as the three pillars of observability:
Metrics: A series of measurements over time
Logs: A record of messages describing noticeable events within a system
Traces: A set of indicators throughout logs that connect a series of related events
At this point, it’s worth addressing the critical commentary we’ll find on the three pillars of observability in more recent online publications. There are indeed some shortcomings of these pillars at a hyper-scale, and thought leaders are evolving beyond them to refine the quality and accuracy of observability even further. However, understanding and using the three pillars is a fundamental requirement to be able to go beyond them. They’re entirely relevant and valuable to the MTAEDA application, and implementing these basic principles is required for all microservice architectures.
In the world of EDA, we have two significant complications to overcome that require a modified approach to implementing these pillars:
Applications with a microservice architecture are, by definition, made up of many smaller separate components that run in total process isolation from each other. They might even run on different infrastructures, be developed with different languages, or require different CPU architectures.
Event-driven architecture decouples microservices to the point that we cannot control how many downstream consumers react to a single produced event.
For each of the three pillars, let’s understand the specific challenges introduced by these complications, as compared with monolithic applications. This will provide the key questions to answer in subsequent sections of this chapter.
Metrics
An application can produce a series of metrics that make it more observable. The primary and most critical metric is whether the application is running. For classic monoliths, this is generally not a difficult thing to observe. If the application is not running or has internally stalled, it won’t process any more work. A monolith may be monitored by an external service to check that it’s still running, often referred to as a health probe or availability signal.
In the world of microservices, we can’t (and shouldn’t) think of a single availability signal for the entire application. The application is made up of many independently running services, each with its own ability to be healthy—or not. One service being unavailable doesn’t mean a total failure of the overall system. Other services should be fault-tolerant and still function. It might be that continuing to function is restricted to responding with a dependency failure message, but the point is that it’s still responsive and able to function in some way.
Furthermore, microservices should scale horizontally. While one instance of a service might be unhealthy or unavailable, it should not represent a single point of failure for the overall system.
Other metrics become important when we start to look at the performance, or throughput, of a service. Where a monolith can report CPU usage as a binding limit—100% and our application is maxed out—a microservice landscape is monitored as a mix of CPU values, representative of the demand placed on specific smaller chunks of the overall application functionality and providing a useful method to drive horizontal scaling. In addition to the CPU, an instance of a single service might emit its memory usage, storage performance, or even a custom metric that’s relevant to its purpose (for example, the number of concurrent widgets it’s processing).
Get hands-on with 1400+ tech skills courses.