Security, Compliance, and Responsible Operation in LLMOps
Explore how to harden a retrieval-augmented generation system by implementing defense-in-depth architecture, including input and output guardrails, addressing prompt injection, denial-of-service, and sensitive data leaks. Understand security challenges unique to LLMs and apply controls to ensure the system meets production safety requirements while preparing for reliable deployment.
In the previous lesson, we focused on correctness.
We built an evaluation system that measured metrics such as faithfulness to the provided context and used LLM-based evaluators (models used to judge other model outputs) to catch semantic regressions before they reached production.
Correctness and safety are distinct concerns.
Evaluation tells us whether an answer is grounded in the provided context. It does not tell us whether the model should respond to the request at all, whether sensitive data is being exposed, or whether a user is attempting to manipulate the system.
From a security perspective, a production LLM application introduces significant risk surface.
We expose a natural-language interface to a non-deterministic model. The system accepts arbitrary user input, combines it with internal instructions and private data, and returns the output directly to the user, often at our own expense.
In traditional software, this would be equivalent to passing unchecked user input directly into exec() or a SQL query. In LLM systems, this pattern is surprisingly common.
In the deploy phase of the 4D framework, we must shift our mindset from features to hardening. This lesson explains why LLM security is fundamentally different from classical application security and how we build a defense-in-depth architecture that makes a RAG system safe enough to operate on the public internet.
Why LLM security is different
Traditional application security relies on deterministic defenses against deterministic attacks.
Firewalls block ports. Input sanitizers strip known dangerous characters. Web Application Firewalls (WAFs) match signatures like '; DROP TABLE. LLMs do not operate solely on tokens and syntax. They operate on meaning.
Since the input space is natural language, LLMs are susceptible to what we can call cognitive attacks. These are inputs designed to manipulate the model’s reasoning rather than exploit a parser bug.
A well-known example illustrates ...