A Guide to Learning Software Trace and Log Analysis Patterns/

...

Basic Concepts

This lesson introduces the essential fundamentals and basic concepts of software traces and log analysis.

We'll cover the following...

Software trace and log
Process
Thread
Adjoint thread
Component/Module/Source
File and function
Trace message
Stack trace
Minimal trace graphs

Process

An operating system process is a container for resources such as memory, files, and synchronization objects. Even the OS kernel itself can be considered a process. Each process has its:

Process identifier (PID)
Thread identifier (TID)
Image name (for example, notepad.exe)
List of associated loaded DLL modules (or shared libraries in Linux/Mac OS X)

An image name is also a module. It is important to remember that several processes can run in parallel, each having the same image name. For example, there can be two instances of notepad.exe. Mostly, in such a case, the list of shared libraries (DLLs) in both instances is identical. At the same time, one image name may correspond to completely different processes because, on the startup, a process may load different modules (shared libraries) for different purposes. A Windows example is svchost executable. On a running Windows system, we can find many such svchost processes. When analyzing software logs, we can filter messages related to a specific PID or image name to find abnormal behavior according to the expected message flow. A typical example could be the following situation: after the middle of the log, we no longer see any messages from a specific PID, not even any termination or process end messages.

Thread

In Windows and Mac OS X, a thread is an execution unit owned by some process. In Linux, threads are almost the same as processes with the difference that they share the same virtual memory. We know that trace messages come from some thread because we need to execute code to emit a trace message. Each thread is executed on some CPU and, in general, can also have its CPU changed during execution history. Filtering by threads, for example, allows us to detect anomalous behavior such as blocked execution activity and various execution delays. Here, in the pictorial example below, we see a discontinuity for TID (thread identifier) 2 and a delay in TID 1.

Adjoint thread

If a thread is a linear ordered flow of activities associated with a particular TID, as seen from the perspective of trace messages through time, we can also extend this flow concept and consider a linear flow of activities associated with some other parameter such as PID, CPU ID, or message text. Such messages have different TIDs but have some chosen constant parameter or column value in the trace viewing tool. The name adjoint comes from the fact that, in threads of activity, TID stays the same whereas other message attributes vary; in adjoint threads, we have the opposite. In Windows Process Monitor, we use exclusive and inclusive filtering to form adjoint threads. By applying complex filtering criteria, we get adjoint threads from other adjoint threads—for example, an adjoint thread with a specific PID and file activity formed after an inspection of an adjoint thread with the same image name, i.e., svchost.exe. The illustration below is an example of an adjoint thread using Windows Process Monitor.

Component/Module/Source

Trace messages come from a thread that belongs to a PID, but the code to emit them resides in source code files. Some source code files can also be reused, such as static library code, and included in different modules. Such DLL modules or shared libraries can also be loaded into different processes, such as hooking modules. Therefore, source or module (in a simpler case) is another grouping of messages based on subsystem and functional division that may include several source code files. By module or source filtering, we can see subsystem activities.

Analysis patterns may be classified into several categories:

The vocabulary category consists of patterns related to a problem description from the point of view of a user.
The error category covers general error distribution patterns. These patterns are related to error and failure messages, either explicitly stating an error or indirectly via an error code, abnormal function return value, or status values in the failure range.
We also consider whole traces (here we ignore trace or log message contents and treat all messages statistically), their large-scale structure (about the coarse grain structure of software traces and logs where the division unit is often a component or some high-level functionality), activity patterns related to various software activities we see in logs and traces (most of them involve time dependency), patterns related to individual trace message structure, patterns related to collections of messages (called blocks), and, finally, patterns related to several traces and logs as a collection of artifacts from software incident (called trace sets).
There are also memory, text, code, and data categories.

Getting Started

Activity

Block and Data

Error and Large Scale

Message 1

Message 2

Trace as a Whole

Trace Set

Vocabulary, Memory, Code and Text

Conclusion

Index

Basic Concepts

Software trace and log

Process

Thread

Adjoint thread

Component/Module/Source

File and function

Trace message

Stack trace

Minimal trace graphs