Search⌘ K
AI Features

Implementing Safety Guardrails

Explore how to implement safety guardrails in AI applications using the Llama Stack Safety API. This lesson teaches you to register and apply shields like llama_guard, attach them to agents, and monitor unsafe content. Understand best practices for input and output moderation to build responsible and trustworthy AI systems.

Generative AI models are powerful, but not without risk. They may produce harmful, offensive, biased, or unsafe content, especially when prompted with adversarial or ambiguous inputs. In many production applications, this is unacceptable.

That’s why Llama Stack includes a built-in Safety API and a system of configurable shields that allow developers to enforce safety guardrails at multiple points in the interaction pipeline.

In this lesson, we’ll learn how to register and apply a shield like llama_guard, attach it to an agent, and observe how unsafe content is intercepted before it can be processed or returned. These tools help us build more trustworthy, responsible applications.

Why safety matters

Even well-designed prompts and helpful models can produce unsafe or inappropriate outputs under the right conditions. Consider the following risks:

  • Toxicity: Hate speech, slurs, or personal attacks

  • Violence: Descriptions or encouragement of harm

  • Self-harm: Responses to mental health questions that are inaccurate or unsafe ...