...

/

Implementing Safety Guardrails

Implementing Safety Guardrails

Learn how to use Llama Stack’s Safety API to filter potentially harmful content. Register and apply safety shields to agents, protecting both input and output through a structured, provider-based moderation system.

Generative AI models are powerful, but not without risk. They may produce harmful, offensive, biased, or unsafe content, especially when prompted with adversarial or ambiguous inputs. In many production applications, this is unacceptable.

Press + to interact

That’s why Llama Stack includes a built-in Safety API and a system of configurable shields that allow developers to enforce safety guardrails at multiple points in the interaction pipeline.

In this lesson, we’ll learn how to register and apply a shield like llama_guard, attach it to an agent, and observe how unsafe content is intercepted before it can be processed or returned. These tools help us build more trustworthy, responsible applications.

Why safety matters

Even well-designed prompts and helpful models can produce unsafe or inappropriate outputs under the right conditions. Consider the following risks:

  • Toxicity: Hate speech, slurs, or personal attacks

  • Violence: Descriptions or encouragement of harm

  • Self-harm: Responses to mental health questions that are inaccurate or unsafe ...