Search⌘ K

Citations, Sources, and Prompt Injection Defense

Learn how to build trust with source citations and defend against the critical security threat of prompt injection.

For an AI application to succeed in real-world settings, it must not only be effective but also responsible in its operation. This responsibility depends on two core requirements. The first is trust: users must be confident that the AI’s outputs are accurate, verifiable, and grounded in correct information. The second is security: the system must be protected against attempts by malicious actors to manipulate its behavior.

Achieving trust and security is not an automatic outcome of using a powerful model. It requires deliberate engineering. An ungrounded model can produce plausible falsehoods, undermining user trust. An unprotected model can have its core instructions hijacked, creating serious security risks.

This lesson will equip us with the engineering principles for both pillars. We will learn how to build trust by making our AI cite its sources, and how to build security by defending our applications against the critical vulnerability of prompt injection.

Building trustworthiness: Citations and sources

A trustworthy AI must be able to substantiate its claims. Consider a common and practical scenario to understand why this is so critical. A popular consumer electronics review website uses an AI to answer customer questions based on detailed product manuals. A potential customer asks, “Is the new ‘Innovate’ smartphone waterproof?” The AI, having scanned the manual, confidently answers, “Yes, the Innovate smartphone is fully waterproof.” The customer buys the phone, takes it swimming, and it is immediately damaged by water. The company now faces an angry customer and a potential lawsuit. The problem? The manual stated the phone was “water-resistant for up to 30 minutes at a depth of 1 meter,” a subtle but critical distinction from “fully waterproof.” The AI’s confident but imprecise answer created a serious real-world problem.

This scenario highlights a clear requirement. An unsubstantiated or imprecise claim from an AI can mislead users and create significant risk. Citations are the mechanism that turns a claim into verifiable information. They ...