Basic Concepts

This lesson introduces the essential fundamentals and basic concepts of software traces and log analysis.

Software trace and log

What is a software trace or log? For our purposes, software traces and logs are simply sequences of formatted messages sent from running software. They are usually arranged by time and can be considered as the software’s narrative story.

Process

An operating system process is a container for resources such as memory, files, and synchronization objects. Even the OS kernel itself can be considered a process. Each process has its:

  • Process identifier (PID)
  • Thread identifier (TID)
  • Image name (for example, notepad.exe)
  • List of associated loaded DLL modules (or shared libraries in Linux/Mac OS X)

An image name is also a module. It is important to remember that several processes can run in parallel, each having the same image name. For example, there can be two instances of notepad.exe. Mostly, in such a case, the list of shared libraries (DLLs) in both instances is identical. At the same time, one image name may correspond to completely different processes because, on the startup, a process may load different modules (shared libraries) for different purposes. A Windows example is svchost executable. On a running Windows system, we can find many such svchost processes. When analyzing software logs, we can filter messages related to a specific PID or image name to find abnormal behavior according to the expected message flow. A typical example could be the following situation: after the middle of the log, we no longer see any messages from a specific PID, not even any termination or process end messages.

Thread

In Windows and Mac OS X, a thread is an execution unit owned by some process. In Linux, threads are almost the same as processes with the difference that they share the same virtual memory. We know that trace messages come from some thread because we need to execute code to emit a trace message. Each thread is executed on some CPU and, in general, can also have its CPU changed during execution history. Filtering by threads, for example, allows us to detect anomalous behavior such as blocked execution activity and various execution delays. Here, in the pictorial example below, we see a discontinuity for TID (thread identifier) 2 and a delay in TID 1.

Adjoint thread

If a thread is a linear ordered flow of activities associated with a particular TID, as seen from the perspective of trace messages through time, we can also extend this flow concept and consider a linear flow of activities associated with some other parameter such as PID, CPU ID, or message text. Such messages have different TIDs but have some chosen constant parameter or column value in the trace viewing tool. The name adjoint comes from the fact that, in threads of activity, TID stays the same whereas other message attributes vary; in adjoint threads, we have the opposite. In Windows Process Monitor, we use exclusive and inclusive filtering to form adjoint threads. By applying complex filtering criteria, we get adjoint threads from other adjoint threads—for example, an adjoint thread with a specific PID and file activity formed after an inspection of an adjoint thread with the same image name, i.e., svchost.exe. The illustration below is an example of an adjoint thread using Windows Process Monitor.

The following illustration shows what happens when we include the process name in the Windows Process Monitor.

Component/Module/Source

Trace messages come from a thread that belongs to a PID, but the code to emit them resides in source code files. Some source code files can also be reused, such as static library code, and included in different modules. Such DLL modules or shared libraries can also be loaded into different processes, such as hooking modules. Therefore, source or module (in a simpler case) is another grouping of messages based on subsystem and functional division that may include several source code files. By module or source filtering, we can see subsystem activities.

File and function

Source code consists of files. Each file has some functions inside it that do the actual tracing. With file or function filtering, we can see a more fine-grained flow of certain functionality than source or module adjoint thread.

Trace message

A formatted trace message is just a sentence with some invariant and variable parts. It is possible to trace the invariant and the variable parts, more technically known as the message invariant and data flow analysis patterns, respectively.

Stack trace

Because each trace message originated from some function in the source code, it has an associated stack trace that is similar to a live debugging scenario where we put a breakpoint at a trace message code location.

Note: It is important to remember that stack traces (or backtraces) are read from bottom to top, as in a debugger.

Minimal trace graphs

To illustrate trace analysis patterns graphically, we use the simplified abstract pictorial representation of a typical software trace or log. It has essential features, such as message number, time, PID, TID, and message text. Sometimes, for illustration purposes, we provide trace fragments in a textual form where we include or exclude certain message attributes (or columns), such as message source, module, time, or date.

Analysis patterns may be classified into several categories:

  • The vocabulary category consists of patterns related to a problem description from the point of view of a user.
  • The error category covers general error distribution patterns. These patterns are related to error and failure messages, either explicitly stating an error or indirectly via an error code, abnormal function return value, or status values in the failure range.
  • We also consider whole traces (here we ignore trace or log message contents and treat all messages statistically), their large-scale structure (about the coarse grain structure of software traces and logs where the division unit is often a component or some high-level functionality), activity patterns related to various software activities we see in logs and traces (most of them involve time dependency), patterns related to individual trace message structure, patterns related to collections of messages (called blocks), and, finally, patterns related to several traces and logs as a collection of artifacts from software incident (called trace sets).
  • There are also memory, text, code, and data categories.