...

/

Enhancing Agent Capabilities with Memory in LlamaIndex

Enhancing Agent Capabilities with Memory in LlamaIndex

Learn how to empower AI agents with memory for better context retention and smarter interactions.

Imagine chatting with an AI assistant that not only responds accurately in the moment, but also remembers your name, job, and preferences in future conversations. In this lesson, we’ll build exactly that—a memory-aware agent using LlamaIndex.

To make this possible, we need to integrate memory systems into our AI agent, enabling it to store and retrieve information across interactions.

Press + to interact
LlamaIndex agent enhanced with memory: chat history is stored in a memory buffer and persisted in vector storage to enable contextual retrieval
LlamaIndex agent enhanced with memory: chat history is stored in a memory buffer and persisted in vector storage to enable contextual retrieval

Types of memory in LlamaIndex

LlamaIndex offers several memory modules to build smart agents:

  • Primary memory (ChatMemoryBuffer): It is a short-term memory that holds the current conversation context.

  • Secondary memory (VectorMemory): It is a long-term memory that retains important information beyond a single conversation.

  • Composable memory (SimpleComposableMemory): It combines primary and secondary memory sources.

We’ll walk through how to set up each of these, step by step:

First, we import all the modules we’ll use throughout the lesson.

from llama_index.llms.groq import Groq
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core.memory import VectorMemory, SimpleComposableMemory, ChatMemoryBuffer
from llama_index.core.llms import ChatMessage
from llama_index.core.tools import FunctionTool
Import required modules

These modules will help us connect to a language model, embed user messages as vectors, define memory mechanisms, and build interactive tools that our agent can use.

  • Groq (LLM): This import allows the agent to connect to a language model hosted by Groq, such as llama3-70b-8192, which is responsible for generating responses.

  • OllamaEmbedding: This module connects to a local model running through Ollama, such as nomic-embed-text. It provides fast and efficient conversion of text into vector embeddings—crucial for enabling semantic search. Since we rely on semantic search to retrieve relevant context from memory, the embedding model plays a key role in memory-aware interactions. While LlamaIndex supports embedding models from various providers, we use Ollama here because it is lightweight, runs locally, and is free to use.

  • Memory classes (VectorMemory, ChatMemoryBuffer, SimpleComposableMemory): These classes provide different types of memory functionality. VectorMemory manages long-term storage using embeddings, ChatMemoryBuffer retains recent conversation history, and SimpleComposableMemory combines both to create a unified memory system.

  • ChatMessage: This class defines how individual chat messages are structured, enabling consistent storage and retrieval within the agent’s memory system.

  • FunctionTool: This component allows standard Python functions to be wrapped as callable tools, which the agent can use during conversations to perform specific tasks, such as retrieving stored memories.

Before we can give our agent memory, we need to connect it to a language model and an embedding model. The language model will handle understanding and generating text, while the embedding model will convert user inputs into vector representations that allow for similarity-based retrieval later.

llm = Groq(model="llama3-70b-8192", api_key="{{GROQ_API_KEY}}")
embedding_model = OllamaEmbedding(model_name="nomic-embed-text")
Initialize the LLM and embedding model
  • llm is our connection to the Groq-hosted LLaMA 3 model, which generates responses for the agent.

  • embedding_model allows us to turn user input into vectors that can be stored and searched in long-term memory.

With both the language model and embedding model set up, we can now turn our attention to memory—an essential part of building an intelligent, context-aware agent.

Defining memory components

...