Introduction to AI Agents
Get an introduction to AI agents, their core features, types, evolution, and when to use them.
The landscape of AI is rapidly evolving, especially with the rise of large language models (LLMs). While early LLM applications focused on single-turn interactions or basic text generation, the field of AI agents, with its roots in earlier rule-based and learning-based systems, has seen a significant resurgence and transformation. A new category of intelligent systems, powered by LLMs, has emerged. These systems are not limited to processing information in isolation. They are designed to carry out complex, multi-step tasks on behalf of users, often without explicit instructions at every step.
In this course, we’ll move beyond individual agents to focus on agentic system design. This means understanding how individual agents are not just standalone entities, but are architected and integrated as part of larger, cohesive systems. Effective agentic system design involves making deliberate architectural choices and utilizing reusable patterns to create reliable, scalable, and adaptive AI solutions. Real-world agentic systems often involve multiple interacting agents, necessitating careful planning, communication, and arbitration as fundamental design aspects.
By the end of this lesson, you will be able to:
Define what makes a system an AI agent.
Distinguish between AI models and agents.
Identify different types of agents and their roles.
Understand the evolution from rule-based to LLM-powered agents.
Evaluate when to use an agent instead of a simpler solution.
What is an AI agent?
In the world of artificial intelligence, the term agent refers to a system that can perceive its environment, make decisions, and act autonomously to achieve specific goals. This concept, often summarized as ‘perceive-reason-act,’ is a cornerstone of classical AI theory, notably popularized by Russell and Norvig. Unlike a passive model that requires a user to query it or interpret its outputs, an AI agent is an active entity. It can sense, think, and act on its own, often without continuous human oversight.
Let’s break this down more concretely. At its core, an AI agent has three fundamental capabilities:
Perception: The agent must be able to sense its environment. This could mean reading text from a user, analyzing images or audio, or retrieving data from sensors or databases. The goal is to extract meaningful information from the raw input.
Reasoning and planning: Once the environment is perceived, the agent must make decisions. This involves understanding context, selecting actions, and planning steps toward a goal. LLMs like GPT-4 are often used here, providing powerful language-based reasoning abilities. Agents can be reactive, responding directly to immediate stimuli, or deliberative, engaging in multi-step planning and reasoning before acting.
Action execution: The agent must then act. This could mean sending a reply, calling an API, triggering a robot’s motion, or updating a database. The key is that the agent’s actions are grounded in its reasoning process and tailored to its goals.
This full pipeline, which includes sensing, interpreting, and acting, captures what makes an AI system agentic rather than reactive. The autonomy levels of agents can vary, from partial automation requiring human approval to full independence, depending on the task and safety requirements.
Here’s a simple real-world analogy:
Imagine a personal AI assistant embedded in your smart home. It hears you say, “Remind me to call mom at 6 PM.” It parses your speech (perception), understands that this is a timed reminder (reasoning), and schedules an alarm for 6 PM (action).
Another analogy that mimics a more complex agentic system:
Picture an AI travel concierge running on your phone. It sees your flight to Berlin has been cancelled (perception), and reasons that you must still reach Berlin tonight. It then checks alternate flights/trains, books the best combo, and messages your hotel about the late arrival (plan and multi-step action).
This full pipeline, which includes sensing, interpreting, and acting, captures what makes an AI system agentic rather than reactive.
Key characteristics of an AI agent are summarized in the following table:
Key Characteristics of an AI Agent
Feature | Description |
Autonomy | Acts independently, initiating actions and making decisions without continuous human intervention. |
Goal-Oriented Behavior | Consistently directs actions toward achieving predefined objectives, rather than merely reacting or producing isolated outputs. |
Perception and Feedback Loop | Continuously observes its environment, processes inputs, and adjusts behavior based on the outcomes or new information. |
Continuity | Maintains memory or context over time, allowing multi-turn reasoning. |
Flexibility | Can revise plans and policies when objectives or context change, provided the agent’s perception, memory, and planning modules support it. |
This agent-centric view aligns with how intelligence is typically understood in biological systems. It is not just about recognizing patterns, but about using those insights to take meaningful action.
AI models vs. AI agents
A common point of confusion in AI system design is the distinction between an AI model and an AI agent. Although both are foundational elements in artificial intelligence, they serve very different roles.
Let’s start with the simpler concept:
An AI model is a self-contained program trained to perform a specific function. More precisely, an AI model is often an artifact (e.g., a set of learned weights and biases) that is consumed by a program to perform a specific function. For example:
A classification model predicts whether an email is spam or not.
A text generation model completes a sentence or writes a poem.
A speech recognition model converts audio into text.
These models do not have goals, initiative, or awareness of context beyond the current input. They wait for a prompt, compute an output, and stop. In this sense, models are powerful, but passive.
An AI agent, by contrast, is an active system. It uses one or more models as components within a larger decision-making process. An agent has autonomy, memory, goals, and the ability to take action based on what it observes.
An agent might:
Use a language model to understand instructions.
Call a search tool to gather information.
Store conversation history in a vector database to reuse facts or preferences later.
Monitor its success and adapt its behavior over time.
The key difference is this: the agent takes initiative and operates over time, often within a loop of perception, reasoning, and action.
Comparison Between AI Model and AI Agent
Feature | AI Model | AI Agent |
Scope | Narrow task | Broader system of behavior |
Role | Computes outputs from inputs | Chooses actions in pursuit of goals |
Context Awareness | Limited to the current inference window | Maintains and updates internal context |
Initiative | Waits for input | Acts autonomously when triggered or scheduled |
Tool Use | No | Yes, often includes multiple tools and models |
Example | Sentiment classifier | Customer support assistant that answers, escalates, and logs tickets |
Here’s an analogy: If an AI model is like a calculator, then an AI agent is like a personal assistant. The calculator waits for a formula, and gives a result. The assistant listens to your needs, figures out what to do, takes initiative, asks clarifying questions, and performs actions on your behalf.
Three categories of AI agents
AI agents can take many forms depending on how they interact with their environment. One way to organize this variety is by categorizing agents based on their mode of operation. Broadly, there are three main types:
Software-based agents (sandbox environment)
These agents operate entirely in digital spaces. They interact with users, applications, or data through APIs, web interfaces, documents, or databases.
Examples include:
A chatbot that provides customer service on a retail website.
An email triage agent that categorizes and replies to messages.
A trading bot that monitors financial news and executes trades.
Although these agents are not embodied in the physical world, they can still carry out meaningful actions, such as updating records, sending alerts, or triggering automated workflows.
Key characteristics:
No physical sensors or hardware.
Act through code, interfaces, and digital data.
Often use tools like web search, file access, and language models.
Physical agents (embodied systems)
Physical agents interact with the real-world using sensors and actuators. These systems perceive their surroundings through hardware, and perform actions by physically affecting the environment.
Examples include:
A domestic robot that cleans your home and responds to voice commands.
An autonomous vehicle that navigates traffic and avoids obstacles.
A robotic arm on a factory floor that assembles products.
These agents require a complex integration of perception (e.g., cameras, microphones, LiDAR), real-time reasoning, and precise control mechanisms.
Key characteristics:
Sense the physical world using embedded sensors.
Act using motors, arms, wheels, or other actuators.
Must handle uncertainty, delay, and safety in real environments.
Hybrid and adaptive agents (real-world integration)
Hybrid agents combine both digital and physical capabilities. These agents work across environments, and continuously adapt based on feedback. They are capable of learning from both structured data, and unstructured sensory input over time.
Examples include:
An AI-driven traffic system that analyzes live camera feeds and adjusts signal timing based on congestion.
A healthcare assistant that monitors wearable devices, suggests treatment adjustments, and communicates with doctors.
A warehouse robot that uses online databases to restock inventory and physically moves goods.
These agents often blend software logic with real-world embodiment and rely on learning mechanisms to refine their behavior over time.
Key characteristics:
Operate across both digital and physical domains.
Integrate multi-modal data (text, images, sensor readings).
Adapt and improve through real-world feedback loops, potentially by periodically retraining its perception module with logged sensor data.
Each of these categories highlights a different level of complexity and environmental interaction. As we move from software-based to hybrid agents, the challenges grow, and so do the opportunities for creating deeply intelligent systems.
Evolution from rule-based systems to LLM-powered agents
The idea of autonomous agents is not new. In fact, early AI systems from decades ago could already make decisions and take actions. However, their capabilities were limited due to the narrow scope of their design. Most early agents were rule-based systems: if X happened, then do Y. This approach worked for clearly defined, repetitive tasks, but it quickly failed in dynamic or unpredictable environments.
Rule-based agents: Fixed logic, limited flexibility
Rule-based or expert systems such as MYCIN and XCON, were prominent in the 1970s to 1980s. In rule-based systems, human developers explicitly define all the rules.
For example:
If the user says “I forgot my password,” then show a password reset form.
If the temperature exceeds 30°C, turn on the fan.
These systems are easy to understand and control. However, they do not generalize well. They cannot infer new rules from data or handle edge cases and ambiguity. Scaling such systems requires writing and maintaining thousands of individual rules by hand.
Learning-based agents: Trial and feedback
The next major leap came with machine learning. Learning-based agents, spanning supervised, unsupervised, and reinforcement learning paradigms, shifted rule authoring from explicit human definition to data-driven optimization. While supervised learning models, like CNNs and RNNs, enable advancements in perception for agents (e.g., speech recognition in virtual assistants), reinforcement learning (RL) introduces a powerful new paradigm. It allows agents to learn complex behaviors through trial and error, receiving rewards for good actions and penalties for bad ones. RL is often considered the most ‘agent-like’ subset due to its focus on sequential decision-making in dynamic environments.
AlphaGo’s breakthrough in 2016, for example, demonstrated its power. Given that, in RL, agents learn through experimentation, this approach enables agents to discover strategies that human designers might not think to program explicitly.
For example:
AlphaGo learned to defeat world-class Go players by playing millions of games against itself.
Robot arms trained with RL can learn to pick up objects without needing precise location data.
While more flexible than rule-based systems, RL agents often require large-scale training, significant compute resources, and carefully structured environments. They are powerful, but not always practical for general-purpose applications.
LLM-powered agents: Generalization through language
The most recent breakthrough in agent design has come from large language models (LLMs) such as GPT-4 (released March 2023), Claude, and Gemini.
While reinforcement learning discovers optimal policies through iterative reward signals, large language models bring a new dimension: zero-shot generalization. They achieve this by internalizing vast amounts of world knowledge and reasoning capabilities directly within their parameters. This enables them to understand and respond to novel tasks without explicit retraining. These models are pre-trained on massive datasets of human language. This, in turn, gives them the ability to reason, plan, and coordinate tasks using natural language instructions.
When LLMs are used within agent systems, they enable several powerful capabilities:
Tool use: Selecting and operating the right tool for a given job/action to perform. The agent, through its reasoning, decides which tool to use and what parameters to pass to it.
Dynamic task planning: Decomposing complex goals into manageable steps and deciding how to approach them.
Conversational memory: Maintaining awareness across multiple turns of dialogue.
Generalization: Handling novel tasks without needing retraining.
For example, a modern LLM-powered agent can handle a request like:
“Summarize this research paper and email the key points to my team.”
The agent can understand the instruction, extract and summarize the content, format the result, and trigger an API call to send the email. None of these behaviors need to be hardcoded in advance.
This level of flexibility marks a major shift. Agents are no longer just reactive systems built on rigid logic. They are evolving into adaptive, collaborative, and tool-using systems that can generalize their behavior across a wide range of problems.
It’s crucial to acknowledge that while LLM-powered agents are immensely powerful, they also come with inherent limitations. These can include issues with reliability (e.g., generating inaccurate outputs often called hallucinations), and interpretability (understanding why an agent made a particular decision). Moreover, there can also be potential for biases, and challenges related to high inference latency and security vulnerabilities. Understanding these limitations is as important as recognizing their capabilities when designing trustworthy agentic systems.
Question: Consider the following scenarios. For each, decide whether a rule-based system or an AI agent would be the more appropriate solution, and why.
Scenario 1: A system that automatically turns off the lights in an office building at 7 PM every weekday.
Scenario 2: A system that responds to customer email inquiries, categorizes them, and escalates complex cases to a human, learning from past interactions.
Scenario 3: A system that flags financial transactions over $10,000 for manual review.
When should you build an agent?
Note: In this course, our focus will be on LLM-powered agents. Whenever we refer to an “agent,” we specifically mean an agent powered by a large language model (LLM).
Not every problem needs an agent. Sometimes, a simple script or rule-based automation is enough. Agents introduce complexity, cost, and often uncertainty, so it’s important to be strategic about when to use them.
You should consider building an agent when your use case involves complex decision-making, dynamic context, or flexible task execution that traditional automation struggles to handle. Below are three key signals that suggest an agent may be the right choice:
The task requires contextual decision-making
Agents shine in situations where outcomes depend on nuanced judgment. If your workflow involves interpreting ambiguous input, balancing trade-offs, or adjusting behavior based on prior interactions, an agent can outperform static rules.
An example of this can be a refund approval process that depends on the customer’s past behavior, the tone of their message, and the reason for return. A rule-based system might miss important cues, while an agent can reason through the situation like a support representative would.
The rules are too complex to maintain
Some systems grow so large and fragmented that updating them becomes a liability. When your logic involves dozens of conditional branches, exception cases, and special handling for edge conditions, agents offer a more maintainable alternative.
An example of this can be a vendor security review process with evolving compliance requirements and unstructured documentation. Instead of encoding every possible case in logic, an agent can read and interpret the documents as part of the workflow.
The workflow relies on unstructured or natural language data
Agents are uniquely equipped to parse and reason over unstructured data such as documents, emails, and conversations. If your pipeline needs to extract meaning from natural language, an agent may significantly reduce manual effort.
For example, consider an insurance claim intake process where customers describe events in their own words. An agent can extract relevant entities, ask follow-up questions, and route the case accordingly.
Before committing to building an agent, it’s crucial to validate that your use case clearly meets these criteria. If a deterministic solution can suffice, it may be a more appropriate approach.
Summary
AI agents are autonomous systems that perceive, reason, and act to achieve goals. Unlike AI models, which are passive and task-specific, agents operate over time, maintain context, and take initiative.
We explored three types of agents: software-based, physical, and hybrid. We also examined how agent design has evolved from rule-based logic to LLM powered flexibility.
Agents are most useful when tasks involve complex decisions, dynamic context, or unstructured data. Understanding when and why to build agents sets the stage for designing them effectively.