Mastering LlamaIndex: From Fundamentals to Building AI Apps/

...

Enhancing Agent Capabilities with Memory in LlamaIndex

Learn how to empower AI agents with memory for better context retention and smarter interactions.

We'll cover the following...

Types of memory in LlamaIndex
Defining memory components
Heads up: Understanding tool-based memory behavior
Try it yourself: Memory retrieval with an agent
- Improving agent reliability without giving up tools
Does memory make a difference?
Conclusion

Press + to interact

Types of memory in LlamaIndex

LlamaIndex offers several memory modules to build smart agents:

Primary memory (ChatMemoryBuffer): It is a short-term memory that holds the current conversation context.
Secondary memory (VectorMemory): It is a long-term memory that retains important information beyond a single conversation.
Composable memory (SimpleComposableMemory): It combines primary and secondary memory sources.

We’ll walk through how to set up each of these, step by step:

First, we import all the modules we’ll use throughout the lesson.

These modules will help us connect to a language model, embed user messages as vectors, define memory mechanisms, and build interactive tools that our agent can use.

Groq (LLM): This import allows the agent to connect to a language model hosted by Groq, such as llama3-70b-8192, which is responsible for generating responses.
OllamaEmbedding: This module connects to a local model running through Ollama, such as nomic-embed-text. It provides fast and efficient conversion of text into vector embeddings—crucial for enabling semantic search. Since we rely on semantic search to retrieve relevant context from memory, the embedding model plays a key role in memory-aware interactions. While LlamaIndex supports embedding models from various providers, we use Ollama here because it is lightweight, runs locally, and is free to use.
Memory classes (VectorMemory, ChatMemoryBuffer, SimpleComposableMemory): These classes provide different types of memory functionality. VectorMemory manages long-term storage using embeddings, ChatMemoryBuffer retains recent conversation history, and SimpleComposableMemory combines both to create a unified memory system.
ChatMessage: This class defines how individual chat messages are structured, enabling consistent storage and retrieval within the agent’s memory system.
FunctionTool: This component allows standard Python functions to be wrapped as callable tools, which the agent can use during conversations to perform specific tasks, such as retrieving stored memories.

Before we can give our agent memory, we need to connect it to a language model and an embedding model. The language model will handle understanding and generating text, while the embedding model will convert user inputs into vector representations that allow for similarity-based retrieval later.

Getting Started

Core Concepts and Using LLMs

Building a RAG Pipeline

Extracting Structured Outputs from LLMs

Agents and Workflows

Monitoring and Evaluating LLM Applications

Building Real-World Applications with LlamaIndex

Wrap Up

Enhancing Agent Capabilities with Memory in LlamaIndex

Types of memory in LlamaIndex

Defining memory components