What is LangChain memory?

In the field of artificial intelligence (AI), the concept of memory extends beyond mere data retention, transforming into a sophisticated mechanism that enables the preservation and utilization of acquired experiences, knowledge, and prior interactions. This advanced form of memory is crucial for the development of intelligent systems that engage in intricate dialogues and decision-making processes, which rely heavily on historical contexts for enhanced functionality.

Exploring the realm of large language models (LLMs), such as those employed in conversational agents and chatbots, the importance of sophisticated memory models becomes evident. By default, LLMs lack the capability to remember previous interactions within a conversation, presenting a significant challenge to achieving seamless and contextually relevant dialogues. The LangChain framework emerges as a solution to this issue, offering a comprehensive set of tools designed to equip conversational AI with the ability to maintain continuity in exchanges by integrating advanced memory models tailored for conversational settings. We’ll explore the intricacies of LangChain memory, focusing on two pivotal models that facilitate seamless and coherent dialogues in AI applications.

ConversationBufferMemory

This foundational model acts as a straightforward memory repository, capturing the entirety of a conversation's history. Its primary function is to ensure that no detail, regardless of its position in the conversation, is forgotten, consequently enabling AI systems to reference any past exchange at any point. Let's go through a simple example to understand how it works in reality by taking a look at the following code snippet:

conversation = ConversationChain(llm=chat, memory=ConversationBufferMemory())
conversation.predict("Answer briefly. What are the first 2 prime numbers?")
conversation.predict("And the next 3?")
conversation.predict("And the next 4?")

In this example, ConversationChain retains context across multiple calls via ConversationBufferMemory, ensuring responses consider the entire conversation history. This capability is crucial for more dynamic and immersive LangChain applications, particularly where continuity and context matter. Initially, asking the LLM "What are the first 2 prime numbers?" yields "2 and 3.". With memory, a follow-up "And the next 3?" intelligently continues the sequence with "5, 7, and 11.". This demonstrates LangChain's memory functionality. To view the memory's contents, print(memory.buffer) reveals the stored conversation, enhancing understanding and control over the interaction flow.

Human: What are the first 2 prime numbers?
AI: The first two prime numbers are 2 and 3.
Human: And the next 3?
AI: The next three prime numbers after 2 and 3 are 5, 7, and 11.
Human: And the next 4?
AI: The next four prime numbers after 2, 3, 5, 7 are 13, 17, 19, and 23.
ConversationBufferMemory memory content

We can see that maintaining memory is streamlined with LangChain, yet unlimited memory also presents challenges. The LLM inherently lacks memory storage, so LangChain compensates by appending this memory as additional context in each request. As memory expands, so do processing demands and usage costs. To mitigate this, LangChain introduces ConversationBufferWindowMemory, an efficient alternative that curtails these drawbacks by managing memory size.

ConversationBufferWindowMemory

Building on the basic concept of conversation memory, this model introduces a "window" mechanism, focusing on a specified range of recent interactions. This selective memory approach is particularly useful for applications where the most relevant context is contained within the latest exchanges, allowing for efficient memory usage and faster retrieval times. Let's take a look at the same example, but this time with ConversationBufferWindowMemory instead.

conversation = ConversationChain(llm=chat, memory=ConversationBufferWindowMemory(k=1))
conversation.predict("Answer briefly. What are the first 2 prime numbers?")
conversation.predict("And the next 3?")
conversation.predict("And the next 4?")

Here, ConversationBufferWindowMemory would store the question and the AI's response in the same format, providing context for subsequent queries, but there's a catch. We initialize the window size (k) to 1. The window size k=1 signifies that the memory will only retain the most recent interaction (one question and its corresponding response) at any given time. Let's go into the details of what would happen at every step:

  • Line 2: This executes the first conversation step, asking about the first two prime numbers. ConversationBufferWindowMemory stores this query and the LLM's response ("The first two prime numbers are 2 and 3.") as the current context.

  • Line 3: In the second step, a follow-up question regarding the next three prime numbers is asked. Given the window size of 1, the memory now discards the initial query and its response, retaining only the context from this second interaction and its forthcoming response ("The next three prime numbers after 2 and 3 are 5, 7, and 11."). At this stage, here’s what our memory content is:

Human: And the next 3?
AI: The next three prime numbers after 2 and 3 are 5, 7, and 11.
ConversationBufferWindowMemory memory content
  • Line 4: The third interaction asks for the next four, omitting the words prime numbers from the query and the initial request specifying prime numbers is no longer in memory. Yet, the response correctly identifies prime numbers, referencing the AI's stored reply, "The next three prime numbers after 2 and 3 are 5, 7, and 11.". This illustrates how the LLM, leveraging previous responses and smart prompt engineering, efficiently maintains context, reducing the need for extensive memory and saving costs.

Note: Given the windowed memory approach, each new predict command refreshed the memory to only hold the latest exchange. Therefore, despite the sequential nature of the questions, each new query was treated independently, without direct consideration of the entire conversation history. This mechanism ensured efficient memory usage and processing but requires thoughtful construction of follow-up questions to maintain coherence in the conversation flow.

In conclusion, while we look at the intricacies of ConversationBufferWindowMemory and its application within LangChain for efficient memory management, it's important to recognize that LangChain's repertoire of memory mechanisms extends beyond this. Among these, ContextBufferTokenMemory and ContextBufferSummaryMemory stand out as additional tools that developers can explore to further enhance their applications. Each memory model offers unique capabilities tailored to different use cases, from focusing on specific tokens or keywords to summarizing broader conversation themes.

Try it yourself

Explore the Jupyter Notebook below to see the LangChain memory mechanisms in action and discover how they can transform conversational AI applications yourself.

Please note that the notebook cells have been pre-configured to display the outputs
for your convenience and to facilitate an understanding of the concepts covered. 
However, if you possess the key, you are encouraged to actively engage with the material
by replacing the placeholder in the second cell. This hands-on approach will allow you to
experiment with the memory techniques discussed, providing a more immersive learning experience.
Try it yourself


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved