Monitoring and Telemetry
Learn how to monitor the behavior of agents and APIs in Llama Stack using its built-in telemetry system. Explore structured logs, metrics, and traces that help us debug, analyze, and optimize AI applications.
As we build more complex systems incorporating agents, tools, safety measures, and retrieval mechanisms, understanding the internal workings becomes increasingly challenging. An agent might retrieve an incorrect document, call a tool improperly, or fail to complete a turn. Visibility is crucial, not just into the final output, but into every intermediate step.
That’s where telemetry comes in.
Llama Stack provides a built-in telemetry system that emits structured events throughout the execution of agents and APIs. These include logs, spans, and metrics. With telemetry enabled, we can:
Trace multi-step workflows like agent turns.
Debug tool calls and safety violations.
Measure inference latency and tool usage.
Store structured traces locally or forward them to observability platforms.
This lesson will show how to configure telemetry, interpret its output, and use it to improve our applications.
Why telemetry matters
In traditional software systems, observability is critical: we need to know what the system did, how long it took, and where it failed. GenAI applications are no different. Without visibility into inference steps, tool calls, and memory ...