...
/Enhancing Agent Capabilities with Memory in LlamaIndex
Enhancing Agent Capabilities with Memory in LlamaIndex
Learn how to empower AI agents with memory for better context retention and smarter interactions.
Imagine chatting with an AI assistant that not only responds accurately in the moment, but also remembers your name, job, and preferences in future conversations. In this lesson, we’ll build exactly that—a memory-aware agent using LlamaIndex.
To make this possible, we need to integrate memory systems into our AI agent, enabling it to store and retrieve information across interactions.
Types of memory in LlamaIndex
LlamaIndex offers several memory modules to build smart agents:
Primary memory (
ChatMemoryBuffer
): It is a short-term memory that holds the current conversation context.Secondary memory (
VectorMemory
): It is a long-term memory that retains important information beyond a single conversation.Composable memory (
SimpleComposableMemory
): It combines primary and secondary memory sources.
We’ll walk through how to set up each of these, step by step:
First, we import all the modules we’ll use throughout the lesson.
from llama_index.llms.groq import Groqfrom llama_index.embeddings.ollama import OllamaEmbeddingfrom llama_index.core.memory import VectorMemory, SimpleComposableMemory, ChatMemoryBufferfrom llama_index.core.llms import ChatMessagefrom llama_index.core.tools import FunctionTool
These modules will help us connect to a language model, embed user messages as vectors, define memory mechanisms, and build interactive tools that our agent can use.
Groq
(LLM): This import allows the agent to connect to a language model hosted by Groq, such asllama3-70b-8192
, which is responsible for generating responses.OllamaEmbedding
: This module connects to a local model running through Ollama, such asnomic-embed-text
. It provides fast and efficient conversion of text into vector embeddings—crucial for enabling semantic search. Since we rely on semantic search to retrieve relevant context from memory, the embedding model plays a key role in memory-aware interactions. While LlamaIndex supports embedding models from various providers, we use Ollama here because it is lightweight, runs locally, and is free to use.Memory classes (
VectorMemory
,ChatMemoryBuffer
,SimpleComposableMemory
): These classes provide different types of memory functionality.VectorMemory
manages long-term storage using embeddings,ChatMemoryBuffer
retains recent conversation history, andSimpleComposableMemory
combines both to create a unified memory system.ChatMessage
: This class defines how individual chat messages are structured, enabling consistent storage and retrieval within the agent’s memory system.FunctionTool
: This component allows standard Python functions to be wrapped as callable tools, which the agent can use during conversations to perform specific tasks, such as retrieving stored memories.
Before we can give our agent memory, we need to connect it to a language model and an embedding model. The language model will handle understanding and generating text, while the embedding model will convert user inputs into vector representations that allow for similarity-based retrieval later.
llm = Groq(model="llama3-70b-8192", api_key="{{GROQ_API_KEY}}")embedding_model = OllamaEmbedding(model_name="nomic-embed-text")
llm
is our connection to the Groq-hosted LLaMA 3 model, which generates responses for the agent.embedding_model
allows us to turn user input into vectors that can be stored and searched in long-term memory.
With both the language model and embedding model set up, we can now turn our attention to memory—an essential part of building an intelligent, context-aware agent.