Building Blocks of AI Agents
Explore the essential components that make LLMs act like true, decision-making AI agents.
Let’s get something straight: an LLM by itself isn’t an agent. It’s a powerful language generator, no doubt, but it doesn’t know how to act, when to search, what tools to use, or what context to carry forward. What actually makes an LLM agentic is the set of tools and augmentations wrapped around it: memory, retrieval, tools, routing logic, and evaluators. This is the infrastructure that enables it to reason, take actions, and adapt across steps.
In this lesson, we’re going to break down those core building blocks. Not the abstract theory—the real, functional pieces you wire together to make LLMs behave like agents. These are the same capabilities used behind-the-scenes by GPTs, Claude, Gemini, and every popular model out there that supports function calling. And you’ll use them too. Retrieval lets you fetch knowledge. Tools allow you to interact with the world. Memory helps preserve useful information. This is the foundation. Once you understand these, the patterns and workflows later in the course won’t just click—they’ll feel obvious. Let’s unpack this.
What are the main components of an agent?
We’re going to walk through the discussed components, one by one. No fluff. Just the core pieces you’ll need to understand if you want to build real, functioning agents. Each one plays a specific role, and together, they turn an LLM from a fancy auto-complete into a system that can solve real problems:
Large language model(s): This is the core of every agent—the part that interprets input, reasons about goals, makes decisions, and generates outputs. It’s what gives the agent “thought.” But on its own, it’s confused, forgetful, and powerless. It can’t access the world, remember past conversations, or perform any real actions. That’s where the rest of the components come in—but it all starts here. Your job as a system designer is to give LLM the right interfaces and context, so it can operate with clarity and purpose.
Retrieval: Even the best LLMs don’t know everything. Their internal knowledge is fixed at training time. Retrieval lets you bolt on real-time access to documents, data, or long-tail knowledge using vector databases and embedding search. This is how agents “look things up” on demand instead of relying on outdated info. Retrieval-augmented generation (RAG) is the most common setup, and it’s foundational for research agents, customer support bots, and anything that requires source-grounded answers.
Tools: Tools are how your agent acquires hands. They can be anything from a calculator to an API call to a Python function. Tools let the LLM take action—query a weather API, fetch user data, trigger actions in your app, or interact with third-party systems. In OpenAI SDK, this appears as function calling. In CrewAI, the tools are wired to agents. Without tools, your agent can reason—but not execute. They make agents useful in the real-world.
Memory: LLMs don’t retain past messages unless you explicitly include them in the current prompt. That means no long-term context, no continuity, no personalization. Memory fixes that. It can be short-term (conversation history), medium-term (task state), or long-term (user profiles, preferences, or past outcomes). Memory transforms your agent from a goldfish into an evolving system—one that can adapt, learn, and track objectives over time. That’s the basic loadout. These components are always present in serious agentic systems. Now, let’s examine the core component, the LLM, in detail.
How does the LLM component behave in agentic systems?
In agentic systems, the LLM does more than generate answers—it reasons through problems, makes decisions, and decides when (and how) to take actions like calling tools or using memory. The LLM drives the loop.
In this course, we’ll use the OpenAI SDK to power our agents. Why? Because as of the time of writing, OpenAI’s models, particularly gpt-4.1
and gpt-4o
, are still one of the best when it comes to function calling, tool use, and structured reasoning. The OpenAI SDK gives us low-level access to model behavior with clean control over how completions are generated and how tool calls are triggered.
Let’s take a look at a minimal example to get our bearings:
For this course, you can paste your API key into the widget below; the widget passes it straight into the OpenAI
client, so the example runs out of the box. In production, never hard-code secrets. Instead, store the key in an environment variable (e.g., OPENAI_API_KEY
) and have your code read it at runtime.
from openai import OpenAIclient = OpenAI(api_key=("{{OPENAI_API_KEY}}"))completion = client.chat.completions.create(model="gpt-4.1",messages=[{"role": "system", "content": "You're a helpful assistant."},{"role": "user","content": "Which NFL organization can be called the greatest of all time?",},],)response = completion.choices[0].message.contentprint(response)
Let’s break down what’s happening in the code above:
Lines 1–3: We import the OpenAI SDK and create a client instance using our API key. This client is what we’ll use to make requests to the model.
Lines 4–10: We send a chat completion request to the
gpt-4.1
model. This includes a message history—here, we’ve included just a basic system prompt and a user message. In later examples, this message list will include tool outputs, plans, and other agent loop components.Line 12: We extract the model’s response from the first choice in the completion. This is where you usually get the assistant’s answer—whether it’s a reply, a plan, or a function call trigger.
It works great, right? Did you get the answer you were looking for? Awesome if you did, and if not, hey, welcome to Steelers Nation. But here’s the part most people miss: .choices[0]
doesn’t just give you the message content; it also gives you metadata about why the model stopped generating. That’s controlled by a property called finish_reason
, and it’s a game-changer for agentic systems.
When the model finishes generating, it sets finish_reason
to one of the following:
"stop"
→ The model reached a natural stopping point (like finishing a sentence)."length"
→ The model hit your maximum token limit."content_filter"
→ The output was blocked or trimmed by a safety filter."tool_calls"
→ The model decided to call a tool."function_call"
→ The legacy version of tool calling, still seen in older APIs.
Why does this matter? Because we don’t want to just trust that the output is “done.” In agentic workflows, we need to know why the model stopped—did it finish the plan, or did it max out its token limit? Did it trigger a tool? Was it a content filter? Let’s see the output of the .choices[0]
property to see that it was indeed “stop” that was the finish_reason
in our case.
from openai import OpenAIclient = OpenAI(api_key=("{{OPENAI_API_KEY}}"))completion = client.chat.completions.create(model="gpt-4.1",messages=[{"role": "system", "content": "You're a helpful assistant."},{"role": "user","content": "Which NFL organization can be called the greatest of all time?",},],)response = completion.choices[0]print(response)
This is one of the reasons we’re using the raw OpenAI SDK instead of a higher-level wrapper like CrewAI for now: it gives us full visibility into model behavior, which we’ll need when we start layering in tools, retries, and agentic loops.
What if I don’t want to use OpenAI models?
Fair question. Maybe you’re on a different stack. Maybe your org is locked into another provider. Or maybe you’re just not a fan of OpenAI—all good. You’ve got options.
The good news is that many top AI providers have added compatibility with the OpenAI SDK. That means you can still use the same code structure—function calls, chat completions, tool use—with minimal changes. You don’t have to rebuild your whole agent just because you swapped out the model.
To use the OpenAI SDK with another provider, you’ll typically change just three lines in your code:
Update the
base_url
to point to the provider’s OpenAI-compatible endpoint.Use the provider’s API key instead of OpenAI’s.
Swap in the correct
model
name for the system you’re targeting.
Before you jump in, make sure you check the provider’s docs. Not all features— like tool_calls
, function_call
, or finish_reason
are guaranteed to behave exactly the same. Some are partial implementations. Read before you code.
As an example, let’s try using the Gemini models instead of OpenAI models:
from openai import OpenAIclient = OpenAI(api_key="{{GEMINI_API_KEY}}",base_url="https://generativelanguage.googleapis.com/v1beta/openai/")completion = client.chat.completions.create(model="gemini-2.5-flash",messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user","content": "Explain to me how AI works"}])response = completion.choices[0]print(response)
What changed? Just three lines as we said:
Line 4: We changed it to
api_key="GEMINI_API_KEY"
so we can simply drop in our real Gemini API key, which we can acquire from Google AI Studio.Line 5: We added the
base_url=...
parameter. This tells the OpenAI SDK to send requests to Google’s Gemini endpoint instead of OpenAI’s.Line 9: We changed the model to
"gemini-2.5-flash"
. You can pick any Gemini model you want to use.
That’s it. No need to rip apart your stack. If you’re not already using the OpenAI SDK, you may want to call Gemini’s native API directly, but if you are, this is the fastest way to swap models and keep building.
Final thoughts
You’ve now seen what truly makes an LLM agentic—and it’s not just the model. It’s the ecosystem wrapped around it: retrieval for knowledge, tools for action, memory for continuity, and infrastructure that ties it all together. These components transform a raw language model into a dynamic problem-solver. By breaking down each of these parts and walking through real, runnable code, you’ve taken the first step toward building agents that are more than just clever chatbots. As we move forward, keep this foundation in mind; everything else builds on it.