LangChain Memory Tutorial 2026: Complete Implementation Guide
Introduction to LangChain Memory Tutorial 2026
Imagine you are talking to a friend, and they remember what you said just moments ago. This helps you have a smooth and natural conversation. Computers, especially AI programs, need to do the same thing.
This “remembering” for AI is called memory, and it’s super important for making smart computer helpers. Without memory, an AI would forget everything you said after each message, making conversations very confusing. Think of it like a friend with very short-term memory, asking “Who are you?” every time you speak.
LangChain is a powerful toolkit that helps build these smart AI programs, and it has great tools for memory. This langchain memory tutorial 2026 will teach you all about how LangChain helps your AI remember things. You will learn how to make your AI agents smarter and more useful by giving them memory, just like a person.
Understanding Memory Types in LangChain
Just like people remember different things in different ways, LangChain offers various types of memory. Each type helps your AI remember information in a special way, depending on what you need it to do. Choosing the right memory type is a crucial part of building an efficient AI helper.
Some memories store every word you say, while others just keep a summary or only the most recent bits. This section will introduce you to the main ways LangChain can help your AI remember. By understanding memory types, you can make smarter choices for your projects.
Here’s a quick look at some common memory types you’ll encounter in LangChain:
| Memory Type | What it does | When to use it |
|---|---|---|
ConversationBufferMemory |
Stores the entire conversation history, word for word. | Short, simple chats where you need full context. |
ConversationSummaryMemory |
Summarizes the conversation history, keeping only key points. | Long conversations where full history is too much, but context is key. |
ConversationBufferWindowMemory |
Stores only the last k turns of the conversation. |
Chats where only recent context matters, and you want to save cost. |
VectorStoreRetrieverMemory |
Stores facts or documents in a special database for deep recall. | When the AI needs to recall specific facts from a large knowledge base. |
Deep Dive into ConversationBufferMemory Implementation
ConversationBufferMemory is the simplest form of memory in LangChain. It works like a notepad where your AI writes down every single thing that is said. This means it remembers both what you say and what the AI says back.
This memory type is easy to use and great for simple, short conversations where you need the AI to recall everything directly. However, if conversations get very long, this memory can grow very big, which can become expensive. You’ll understand the basics of langchain memory tutorial 2026 by starting with this memory type.
Let’s see how to use ConversationBufferMemory in a practical example. First, you need to import the necessary parts from LangChain.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
import os
# Make sure you have your OpenAI API key set up as an environment variable
# For example: os.environ["OPENAI_API_KEY"] = "your_api_key_here"
# This tutorial assumes you have an OpenAI key configured or will use a placeholder LLM.
# 1. Create an instance of ConversationBufferMemory
memory = ConversationBufferMemory()
# 2. Add some messages to the memory (simulating a conversation)
memory.save_context({"input": "Hi there!"}, {"output": "Hello! How can I help you today?"})
memory.save_context({"input": "What's the weather like?"}, {"output": "I'm sorry, I don't have access to real-time weather information."})
# 3. Load the remembered messages
# This will return a dictionary with the full conversation history.
current_conversation = memory.load_memory_variables({})
print(current_conversation)
When you run this code, you will see a dictionary containing the entire conversation. The key history holds all the messages. This demonstrates how the memory keeps track of everything.
Now, let’s connect this memory to a ConversationChain to make an AI that remembers. This chain will use your ConversationBufferMemory to keep the chat flowing naturally. You will need an LLM (Large Language Model) to make it work.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
import os
# It's good practice to set your API key as an environment variable
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # Replace with your actual key or set it in your environment
# Initialize the Large Language Model (LLM)
llm = OpenAI(temperature=0) # temperature=0 makes the AI less creative and more factual
# Initialize the memory
memory_buffer = ConversationBufferMemory()
# Create a ConversationChain with the LLM and memory
conversation = ConversationChain(
llm=llm,
memory=memory_buffer,
verbose=True # verbose=True shows you what's happening behind the scenes
)
print("Starting conversation with ConversationBufferMemory:")
# First interaction
response1 = conversation.predict(input="Hi there! My name is Alex. What's yours?")
print(f"AI: {response1}\n")
# Second interaction, the AI should remember your name
response2 = conversation.predict(input="Can you tell me a fun fact about giraffes, Alex?")
print(f"AI: {response2}\n")
# Third interaction, testing if it remembers the previous fact
response3 = conversation.predict(input="That's interesting! Do they have long necks because they eat leaves from tall trees?")
print(f"AI: {response3}\n")
# Let's check the full memory after the conversation
print("\n--- Full Conversation History ---")
print(memory_buffer.load_memory_variables({}))
In this example, the ConversationChain automatically adds the new inputs and outputs to the memory_buffer. When you ask about giraffes and then follow up, the AI uses the entire history from ConversationBufferMemory to understand your questions. This is a basic but powerful way to make your AI conversational.
ConversationSummaryMemory for Long Conversations
While ConversationBufferMemory is simple, it can quickly become very expensive for long chats. Every message adds to the total number of words the AI has to read and process. Imagine a chat that goes on for hours; the AI would have to re-read everything each time!
This is where ConversationSummaryMemory comes in handy. Instead of storing every single message, it creates a summary of the conversation so far. This summary then replaces the old, long chat history, keeping the overall memory much shorter. It’s like having a friend who just gives you the main points of a long story.
This helps reduce the “token” usage, which is how AI models count words, and thus saves money. It’s an excellent solution for making long-running AI assistants more efficient. This memory type is crucial for cost-effective langchain memory tutorial 2026 applications.
Let’s see how to implement ConversationSummaryMemory. You will need an LLM to create the summary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
import os
# Initialize the LLM for both the main conversation and for summarization
llm_for_conversation = OpenAI(temperature=0.7) # A bit more creative for conversation
llm_for_summary = OpenAI(temperature=0) # Less creative, more factual for summarization
# Initialize the ConversationSummaryMemory with an LLM for summarization
summary_memory = ConversationSummaryMemory(llm=llm_for_summary)
# Simulate a longer conversation
summary_memory.save_context({"input": "Hello! I'm here to talk about my travel plans."}, {"output": "Great! Where are you thinking of going?"})
summary_memory.save_context({"input": "I'd love to visit Paris next summer. Any tips?"}, {"output": "Paris is wonderful! Consider visiting the Eiffel Tower, Louvre Museum, and trying local pastries."})
summary_memory.save_context({"input": "That sounds lovely! I also heard about some beautiful castles near Paris. Any recommendations?"}, {"output": "Absolutely! Château de Versailles and Château de Chambord are magnificent and popular day trips."})
print("--- Current Summary Memory (before refresh) ---")
print(summary_memory.load_memory_variables({}))
print("\n--- Simulating more conversation and summary update ---")
# The memory will automatically summarize periodically or when the context gets too large.
# For demonstration, let's manually trigger a summary update by adding more context.
# In a real chain, this happens automatically.
summary_memory.save_context({"input": "Okay, Versailles sounds intriguing. How far is it from Paris?"}, {"output": "Versailles is about 20 kilometers (12 miles) southwest of Paris. It's easily accessible by train."})
summary_memory.save_context({"input": "And what about the best time to visit Versailles to avoid crowds?"}, {"output": "Early mornings or late afternoons on weekdays are usually best. Tuesdays can be very crowded as the Louvre is closed."})
print("\n--- Summary after more interactions ---")
# The summary should now reflect the updated conversation, not just the raw messages.
print(summary_memory.load_memory_variables({}))
# Now, let's integrate this into a ConversationChain
conversation_with_summary = ConversationChain(
llm=llm_for_conversation,
memory=summary_memory,
verbose=True
)
print("\nStarting conversation with ConversationSummaryMemory:")
response_s1 = conversation_with_summary.predict(input="Hi there! I'm planning a trip to France.")
print(f"AI: {response_s1}\n")
response_s2 = conversation_with_summary.predict(input="I'm interested in castles and history. What should I prioritize?")
print(f"AI: {response_s2}\n")
response_s3 = conversation_with_summary.predict(input="Tell me more about the French Revolution's impact on these castles.")
print(f"AI: {response_s3}\n")
print("\n--- Final Summary from memory after chain interaction ---")
print(summary_memory.load_memory_variables({}))
Notice how ConversationSummaryMemory stores a concise summary rather than the full chat. When you integrate it with a ConversationChain, the chain passes the current conversation and the existing summary to the LLM. The LLM then uses this combined information to generate its response, and the memory updates its summary based on the new turns. This process helps keep the context relevant without overwhelming the AI with old, detailed chat logs. It’s a smart way to manage memory, especially for long-term AI interactions.
ConversationBufferWindowMemory Usage
Sometimes, you don’t need to remember everything said in a chat, nor do you want a summary. What if you only care about the most recent parts of the conversation? This is where ConversationBufferWindowMemory becomes incredibly useful.
This memory type remembers only the last k number of conversation turns. Think of k as a window size. If k is 3, it will only keep track of the last three exchanges between you and the AI. Older messages simply fall out of the window and are forgotten.
This is perfect for scenarios where the most recent context is the most important, and older context is less relevant. It also helps in cost optimization for memory by keeping the context small and manageable. This memory type is another valuable tool in your langchain memory tutorial 2026 toolkit.
Let’s implement ConversationBufferWindowMemory and see it in action. You’ll specify the k value when you create the memory object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
import os
# Initialize the LLM
llm_window = OpenAI(temperature=0)
# Initialize ConversationBufferWindowMemory, remembering only the last 2 interactions (k=2)
# An interaction means one user input + one AI output.
window_memory = ConversationBufferWindowMemory(k=2)
print("--- Starting direct memory interaction with k=2 ---")
# Simulate a conversation
window_memory.save_context({"input": "Hello!"}, {"output": "Hi there!"})
print(f"Memory after 1 turn: {window_memory.load_memory_variables({})}")
window_memory.save_context({"input": "How are you?"}, {"output": "I'm doing great!"})
print(f"Memory after 2 turns: {window_memory.load_memory_variables({})}")
window_memory.save_context({"input": "What did you do today?"}, {"output": "I processed some data."})
print(f"Memory after 3 turns (observe the first message disappearing): {window_memory.load_memory_variables({})}")
window_memory.save_context({"input": "That sounds productive!"}, {"output": "It was!"})
print(f"Memory after 4 turns (observe the second message disappearing): {window_memory.load_memory_variables({})}")
# Now, let's integrate this into a ConversationChain
print("\n--- Starting ConversationChain with ConversationBufferWindowMemory (k=2) ---")
conversation_with_window = ConversationChain(
llm=llm_window,
memory=window_memory,
verbose=True
)
# First interaction
response_w1 = conversation_with_window.predict(input="My favorite color is blue.")
print(f"AI: {response_w1}\n")
# Second interaction
response_w2 = conversation_with_window.predict(input="What's your favorite color?")
print(f"AI: {response_w2}\n")
# Third interaction - the first turn about your favorite color should now be forgotten.
# The AI should only remember the previous turn about its own favorite color.
response_w3 = conversation_with_window.predict(input="Do you remember what my favorite color is?")
print(f"AI: {response_w3}\n")
# Fourth interaction - now the second turn is forgotten, only the third turn is remembered.
response_w4 = conversation_with_window.predict(input="What about the color of the sky?")
print(f"AI: {response_w4}\n")
print("\n--- Final Memory State ---")
print(window_memory.load_memory_variables({}))
In the example, after the third interaction, if you asked the AI about your first input (“My favorite color is blue.”), it might not remember it. This is because that turn fell out of the k=2 window. You can adjust k to suit your needs, finding a balance between context and cost. This memory type offers a flexible way to manage short-term context.
Beyond the Basics: Advanced Memory Concepts
So far, you’ve learned about several built-in memory types that cover many common scenarios. However, the world of AI applications is vast and sometimes a standard memory solution just won’t cut it. You might need your AI to remember things in a very specific, unique way.
This is where LangChain’s flexibility shines. You’re not limited to what’s already provided. You can create your own custom memory classes to perfectly fit your application’s needs. This capability is super important for building truly innovative AI experiences. The langchain memory tutorial 2026 emphasizes this customizability.
Additionally, AI often needs to remember facts that aren’t part of the direct conversation. Imagine an AI that acts as a knowledgeable expert. For this, memory with vector stores becomes essential, allowing your AI to pull relevant information from a vast library of data. These advanced concepts empower you to build much smarter agents.
Building Custom Memory Classes
There will be times when none of the standard LangChain memory types perfectly fit your needs. Perhaps you want to remember only specific keywords, or store memory in a very particular database. This is when you can create your own custom memory classes. It might sound complex, but it simply means telling LangChain exactly how your AI should remember things.
A custom memory class needs two main methods: load_memory_variables and save_context. load_memory_variables is how your AI fetches what it remembers. save_context is how your AI updates its memory after each interaction. These methods give you full control over the memory process.
Let’s create a very simple custom memory that only remembers the last user’s input, completely ignoring the AI’s responses.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from langchain.memory.chat_memory import BaseChatMemory
from typing import List, Dict, Any
class LastUserInputMemory(BaseChatMemory):
"""
A custom memory class that only stores the very last user input.
It ignores AI outputs and older user inputs.
"""
last_user_input: str = ""
memory_key: str = "last_input" # The key under which the memory will be loaded
@property
def buffer(self) -> Any:
"""Exposes the raw memory buffer."""
return self.last_user_input
@property
def memory_variables(self) -> List[str]:
"""Define the memory variables this class will expose."""
return [self.memory_key]
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""
Loads the memory variables.
In this case, it returns the last stored user input.
"""
if not self.last_user_input:
return {self.memory_key: "No previous input."}
return {self.memory_key: self.last_user_input}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None:
"""
Saves the context of the conversation.
Here, we only care about and store the 'input' from the user.
"""
self.last_user_input = inputs.get("input", "")
# self.outputs.append(outputs.get("output", "")) # If we wanted to store outputs separately
print(f"Debug: Last user input saved as: '{self.last_user_input}'")
def clear(self) -> None:
"""Clears the memory."""
self.last_user_input = ""
print("Memory cleared!")
# Let's test our custom memory directly
print("--- Testing LastUserInputMemory directly ---")
custom_memory = LastUserInputMemory()
custom_memory.save_context({"input": "Hello, world!"}, {"output": "Hi!"})
print(f"Memory after 1st turn: {custom_memory.load_memory_variables({})}")
custom_memory.save_context({"input": "How are you?"}, {"output": "I am fine."})
print(f"Memory after 2nd turn: {custom_memory.load_memory_variables({})}")
custom_memory.save_context({"input": "What's up?"}, {"output": "Not much."})
print(f"Memory after 3rd turn: {custom_memory.load_memory_variables({})}")
print(f"Full memory content: {custom_memory.load_memory_variables({})}")
# Now, integrate it with a ConversationChain
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
import os
llm_custom = OpenAI(temperature=0)
conversation_with_custom_memory = ConversationChain(
llm=llm_custom,
memory=custom_memory, # Use our custom memory here
verbose=True
)
print("\n--- Starting ConversationChain with LastUserInputMemory ---")
response_c1 = conversation_with_custom_memory.predict(input="My name is Sarah.")
print(f"AI: {response_c1}\n")
response_c2 = conversation_with_custom_memory.predict(input="I like to read books.")
print(f"AI: {response_c2}\n")
# This will likely not remember the name "Sarah" because the memory only keeps the *last* input.
response_c3 = conversation_with_custom_memory.predict(input="What's my name?")
print(f"AI: {response_c3}\n")
# The memory will only provide "I like to read books." as the "last_input" variable.
# The AI might try to guess or say it doesn't know, based on its base knowledge and the limited memory context.
print("\n--- Final Custom Memory State ---")
print(custom_memory.load_memory_variables({}))
This custom LastUserInputMemory will only remember “What’s my name?” from the last interaction when asked in the third turn. This simple example shows how you can tailor memory precisely. You can create much more complex custom memories to store and retrieve specific data points, interact with external databases, or implement unique summarization logic.
Memory with Vector Stores
Sometimes, an AI needs to recall facts or information that wasn’t part of the direct conversation. Imagine building an AI assistant for a large company. It needs to answer questions about company policies, product details, or past projects. This information is usually stored in documents, not in chat history.
This is where memory with vector stores comes in. Instead of just remembering chat turns, the AI can “remember” a vast library of facts. It does this by turning documents into special numerical codes called “embeddings” and storing them in a vector store (a type of database). When you ask a question, the AI converts your question into an embedding too, then searches the vector store for the most similar facts.
This allows the AI to effectively “look up” relevant information from a huge knowledge base, even if it has never been discussed before. It’s a game-changer for building truly knowledgeable AI agents and is a key topic in any advanced langchain memory tutorial 2026.
Here’s a basic example using VectorStoreRetrieverMemory with a simple in-memory vector store (Chroma).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
from langchain.memory import VectorStoreRetrieverMemory
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore import InMemoryDocstore
from langchain.schema import Document
import os
# 1. Prepare some "facts" that our AI should remember
long_story = """
The Great Library of Alexandria was one of the largest and most significant libraries of the ancient world.
It was located in Alexandria, Egypt, and was part of a larger research institution called the Mouseion.
The library flourished under the patronage of the Ptolemaic dynasty and housed hundreds of thousands of scrolls.
It was a center of scholarship and attracted many prominent scholars and thinkers.
Tragically, the library suffered several fires and ultimately faced decline and destruction over several centuries.
Its loss is considered a major cultural tragedy, as countless ancient texts and knowledge were lost forever.
"""
# 2. Split the long story into smaller chunks for better retrieval
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(long_story)
# 3. Create a vector store from these chunks.
# We'll use OpenAIEmbeddings to turn text into numerical vectors.
# And Chroma as our in-memory vector store for simplicity.
# For production, you'd use a persistent vector store like Pinecone, Weaviate, etc.
# Ensure OpenAI API key is set for embeddings
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
# Convert texts to Document objects
docs = [Document(page_content=t) for t in texts]
# Initialize embeddings model
embeddings = OpenAIEmbeddings()
# Create a Chroma vector store from the documents
vectorstore = Chroma.from_documents(docs, embeddings)
# 4. Create a retriever from the vector store.
# A retriever helps to find relevant documents from the vector store.
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1)) # k=1 means retrieve the top 1 most relevant document
# 5. Initialize VectorStoreRetrieverMemory with the retriever
# We also need an LLM to generate responses using the retrieved context.
llm_vector_memory = OpenAI(temperature=0)
retriever_memory = VectorStoreRetrieverMemory(
retriever=retriever,
llm=llm_vector_memory # The LLM is used to integrate retrieved docs into the prompt
)
# 6. Create a ConversationChain using this special memory
conversation_with_vector_memory = ConversationChain(
llm=llm_vector_memory,
memory=retriever_memory,
verbose=True
)
print("--- Starting Conversation with VectorStoreRetrieverMemory ---")
# Ask a question that requires recalling facts from the 'long_story'
response_v1 = conversation_with_vector_memory.predict(input="Tell me about the Great Library of Alexandria.")
print(f"AI: {response_v1}\n")
response_v2 = conversation_with_vector_memory.predict(input="What was its primary purpose and where was it located?")
print(f"AI: {response_v2}\n")
response_v3 = conversation_with_vector_memory.predict(input="What led to its decline and why is its loss significant?")
print(f"AI: {response_v3}\n")
# Now, ask something that's not in the document, to see how it handles it
response_v4 = conversation_with_vector_memory.predict(input="What's the capital of France?")
print(f"AI: {response_v4}\n")
print("\n--- Memory Content (Vector Store will return relevant snippets) ---")
# The memory itself doesn't store chat history like BufferMemory.
# Instead, it retrieves relevant documents based on the input.
# To see what it retrieves, you'd typically inspect the `retriever.get_relevant_documents` directly.
# For a ConversationChain, the retrieved docs are injected into the prompt.
# You can see this in the 'verbose=True' output, typically under 'context'.
In this example, when you ask about the Library of Alexandria, VectorStoreRetrieverMemory uses the retriever to find the most relevant parts of the long_story from the vectorstore. It then feeds these relevant parts along with your question to the LLM. This allows the AI to answer specific factual questions that it “learned” from the documents, not just from the current chat. This capability is vital for AI agents that need to access and use large amounts of structured and unstructured data.
Production Memory Patterns
Building a proof-of-concept for langchain memory tutorial 2026 is one thing, but deploying an AI application in the real world is another. When you move to “production,” you need to think about how memory works for many users, how it stays alive even if your program restarts, and how to keep it secure. This involves implementing robust production memory patterns.
In production, you can’t just keep memory in your computer’s temporary RAM. It needs to be stored somewhere persistent. This means using databases or special caching services. Also, each user needs their own memory, so conversations don’t get mixed up.
Here are some common ways to handle memory in a real-world application:
- Database Storage: For structured memory (like key-value pairs or summaries), a simple database (e.g., PostgreSQL, MongoDB) can store memory tied to a user ID.
- Key-Value Stores: Services like Redis or Memcached are excellent for storing conversational memory because they are very fast. Each user session can have a unique key.
- Cloud Services: Cloud providers (AWS, Azure, Google Cloud) offer managed database and caching services that scale easily.
- Stateless vs. Stateful: Most AI models are “stateless,” meaning they don’t remember anything on their own. You make them “stateful” by explicitly adding a memory component. This memory component is what you manage in production.
Here’s a conceptual example using Redis for memory storage. Redis is a popular choice for fast, temporary data storage, perfect for chat memory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
from langchain_community.memory import RedisChatMessageHistory, ConversationBufferWindowMemory
from langchain.memory import ConversationBufferMemory # For Redis's internal buffer
import os
# --- IMPORTANT ---
# To run this, you need a running Redis server and the 'redis' Python package.
# You can install redis-py: pip install redis
# You might need to set up a Redis instance (e.g., locally, or via a cloud provider like Redis Labs).
# For local Redis, usually it runs on localhost:6379 by default.
# Example Redis connection string. Adjust if your Redis setup is different.
# For a real application, keep this secure, e.g., in environment variables.
REDIS_URL = "redis://localhost:6379/0" # 0 is the database index
# Initialize the LLM
llm_prod = OpenAI(temperature=0)
# --- Using Redis for storing message history ---
# Each user needs a unique session_id to keep their conversations separate.
# In a web app, this `session_id` would come from the user's login or a unique cookie.
user_session_id_1 = "user_123_chat_session"
user_session_id_2 = "user_456_chat_session" # A different user
# Create chat history objects using Redis
# This stores the raw messages in Redis.
history_user1 = RedisChatMessageHistory(session_id=user_session_id_1, url=REDIS_URL)
history_user2 = RedisChatMessageHistory(session_id=user_session_id_2, url=REDIS_URL)
# Wrap the RedisChatMessageHistory in a ConversationBufferMemory
# ConversationBufferMemory can take a chat_memory object to store its messages.
# This means our buffer memory is now backed by Redis, making it persistent.
memory_prod_user1 = ConversationBufferMemory(
chat_memory=history_user1,
return_messages=True # Return messages as LangChain ChatMessage objects
)
memory_prod_user2 = ConversationBufferMemory(
chat_memory=history_user2,
return_messages=True
)
print(f"--- User 1's Memory (Initial Check) ---")
print(memory_prod_user1.load_memory_variables({}))
print(f"--- User 2's Memory (Initial Check) ---")
print(memory_prod_user2.load_memory_variables({}))
# Clear any previous history for this session for a clean start
history_user1.clear()
history_user2.clear()
print("Redis history cleared for both users for this demonstration.")
# Create ConversationChains for each user
conversation_user1 = ConversationChain(
llm=llm_prod,
memory=memory_prod_user1,
verbose=True
)
conversation_user2 = ConversationChain(
llm=llm_prod,
memory=memory_prod_user2,
verbose=True
)
print("\n--- User 1's Conversation ---")
response_p1_u1 = conversation_user1.predict(input="Hello, I'm Alice. I'm learning about LangChain.")
print(f"AI (User 1): {response_p1_u1}\n")
response_p2_u1 = conversation_user1.predict(input="Do you remember my name?")
print(f"AI (User 1): {response_p2_u1}\n")
print("\n--- User 2's Conversation ---")
response_p1_u2 = conversation_user2.predict(input="Hi, I'm Bob. I need help with Python.")
print(f"AI (User 2): {response_p1_u2}\n")
response_p2_u2 = conversation_user2.predict(input="Can you give me a simple Python example?")
print(f"AI (User 2): {response_p2_u2}\n")
print("\n--- Verifying Memories after conversations ---")
print(f"\nUser 1 Memory:\n{memory_prod_user1.load_memory_variables({})}")
print(f"\nUser 2 Memory:\n{memory_prod_user2.load_memory_variables({})}")
# Now, imagine the application restarted, or a long time passed.
# If we re-initialize the memory with the same session_id, it should load the history from Redis.
print("\n--- Re-initializing User 1's memory and checking persistence ---")
reloaded_history_user1 = RedisChatMessageHistory(session_id=user_session_id_1, url=REDIS_URL)
reloaded_memory_user1 = ConversationBufferMemory(
chat_memory=reloaded_history_user1,
return_messages=True
)
print(f"User 1's reloaded memory: \n{reloaded_memory_user1.load_memory_variables({})}")
# If Redis is working, you should see the entire conversation for User 1 stored there.
In this Redis example, each user’s conversation history is stored separately in Redis. If the application goes down and comes back up, or if a user reconnects, their conversation history can be loaded directly from Redis using their unique session ID. This ensures that conversations are persistent and isolated between users. This is a fundamental aspect of building robust AI applications for many users.
Cost Optimization for Memory
Memory is not just about functionality; it’s also about money. Every time an AI model processes information, it incurs a cost based on “tokens” (which are like words or parts of words). The more memory you feed to your AI, the more tokens it processes, and the more expensive your application becomes. This is why cost optimization for memory is a critical skill for any developer in 2026 and beyond.
Imagine you’re paying per word your AI reads. If your memory is a huge book, and the AI has to re-read that book for every single response, the cost adds up very quickly! Smart memory management can significantly reduce your operating expenses.
Here are key strategies for cost optimization for memory:
- Choose the Right Memory Type:
ConversationBufferMemory: Most expensive for long chats as it grows indefinitely. Use only for short, specific interactions.ConversationSummaryMemory: A good balance. Summarizes old parts, keeping context shorter and cheaper.ConversationBufferWindowMemory: Excellent for cost-saving if only recent context is needed. You controlkto manage cost.
- Aggressive Summarization: For
ConversationSummaryMemory, you can tune how often summaries are generated and how detailed they are. Shorter, less frequent summaries save tokens. - Clear Memory: Implement logic to clear memory after a certain period of inactivity, after a session ends, or after a task is completed. This prevents old, irrelevant conversations from costing you money.
- Utilize Cheaper Models for Summarization/Embeddings: For
ConversationSummaryMemoryorVectorStoreRetrieverMemory, you don’t always need the most powerful (and expensive) LLM to generate summaries or embeddings. Using smaller, cheaper models for these background tasks can save a lot. - Caching: Cache memory responses or summarization results where possible. If the same summary is needed multiple times, retrieve it from a cache instead of re-generating it.
- Batching: For
VectorStoreRetrieverMemory, batching embedding calls can be more efficient than making individual calls, although this is more about embedding cost than prompt cost. - Trim Context: Before sending the prompt to the LLM, you can programmatically trim the context if it exceeds a certain token limit, prioritizing the most recent or critical information.
Let’s consider an example where choosing ConversationBufferWindowMemory over ConversationBufferMemory directly impacts cost.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory
from langchain_community.llms import OpenAI
from langchain.chains import ConversationChain
import os
# Assume a very simple LLM interaction cost where each 'token' costs 1 unit.
# (In reality, costs vary, but this illustrates the principle)
def calculate_approx_tokens(text):
# A very rough estimate: 1 token ~ 4 characters or ~ 0.75 words
# This is just for demonstration purposes. Real tokenizers are more complex.
return len(text) // 4
# Initialize a dummy LLM for calculating costs
class MockLLM(OpenAI):
def _call(self, prompt, stop=None, **kwargs):
# In a real scenario, this would be an actual API call.
# Here, we simulate by just returning a generic response.
print(f"--- MockLLM Called ---")
print(f"Prompt sent (first 100 chars): {prompt[:100]}...")
input_tokens = calculate_approx_tokens(prompt)
print(f"Approx. Input Tokens: {input_tokens}")
# Simulate an output
response = "This is a simulated AI response to your query."
output_tokens = calculate_approx_tokens(response)
print(f"Approx. Output Tokens: {output_tokens}")
self.total_cost += (input_tokens + output_tokens) # Add to total cost
return response
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.total_cost = 0 # Track total cost for this LLM instance
# Initialize our mock LLM
mock_llm_buffer = MockLLM(temperature=0)
mock_llm_window = MockLLM(temperature=0)
# Scenario 1: Using ConversationBufferMemory (potentially expensive)
print("--- Scenario 1: Using ConversationBufferMemory ---")
buffer_memory = ConversationBufferMemory()
conversation_buffer = ConversationChain(
llm=mock_llm_buffer,
memory=buffer_memory,
verbose=True
)
for i in range(5):
print(f"\n--- Turn {i+1} with Buffer Memory ---")
user_input = f"User message {i+1} about a complex topic that adds to history."
conversation_buffer.predict(input=user_input)
print(f"\nTotal estimated cost with ConversationBufferMemory: {mock_llm_buffer.total_cost} units")
print(f"Final Buffer Memory content size (char): {len(str(buffer_memory.load_memory_variables({})))}")
# Scenario 2: Using ConversationBufferWindowMemory (cost-optimized)
print("\n--- Scenario 2: Using ConversationBufferWindowMemory (k=2) ---")
window_memory = ConversationBufferWindowMemory(k=2) # Keep only the last 2 turns
conversation_window = ConversationChain(
llm=mock_llm_window,
memory=window_memory,
verbose=True
)
for i in range(5):
print(f"\n--- Turn {i+1} with Window Memory (k=2) ---")
user_input = f"User message {i+1} about a complex topic that adds to history."
conversation_window.predict(input=user_input)
print(f"\nTotal estimated cost with ConversationBufferWindowMemory (k=2): {mock_llm_window.total_cost} units")
print(f"Final Window Memory content size (char): {len(str(window_memory.load_memory_variables({})))}")
# Compare the costs
print("\n--- Cost Comparison ---")
print(f"Buffer Memory total cost: {mock_llm_buffer.total_cost}")
print(f"Window Memory (k=2) total cost: {mock_llm_window.total_cost}")
# You'll observe that as turns increase, the buffer memory cost rises more steeply
# because the prompt length (memory history) grows continuously.
# Window memory's prompt length quickly stabilizes, leading to lower total costs for longer interactions.
In this simulation, ConversationBufferMemory keeps adding to the prompt length with each turn, leading to higher costs. ConversationBufferWindowMemory, however, keeps the prompt length (and thus cost) relatively constant after the initial k turns, making it much more economical for extended conversations. This practical understanding of cost optimization for memory is vital for developing sustainable AI applications in the context of any langchain memory tutorial 2026.
Best Practices and Future Trends in LangChain Memory (2026 Perspective)
By now, you’ve journeyed through the core aspects of LangChain memory, from simple buffers to custom classes and vector store integration. You’ve also touched on production memory patterns and crucial cost optimization for memory. Armed with this knowledge, you are well-prepared to build smarter, more efficient AI applications.
As we look towards 2026, the field of AI, and specifically LangChain, will continue to evolve rapidly. The fundamentals of memory will remain, but new, more sophisticated approaches are always emerging. Staying updated with the langchain memory tutorial 2026 principles is key.
Here are some best practices and emerging trends to keep in mind:
Best Practices for LangChain Memory:
- Start Simple: Begin with
ConversationBufferMemoryorConversationBufferWindowMemoryfor most basic use cases. Don’t over-engineer memory if it’s not necessary. - Profile and Optimize: Always monitor your token usage and costs. If you notice memory becoming a bottleneck or too expensive, then explore
ConversationSummaryMemoryorConversationBufferWindowMemorywith a smallerk. - Separate Knowledge from Conversation: For factual recall, use
VectorStoreRetrieverMemory. Don’t try to cram all your application’s knowledge into the conversational buffer. - Modular Design: Think of memory as a pluggable component. This makes it easier to swap out different memory types or implement custom solutions without changing your entire application logic.
Future Trends in LangChain Memory (2026 Perspective):
- More Intelligent Summarization: Expect LLMs to become even better at creating concise, lossy summaries that retain critical information for context while drastically cutting token usage. This means summaries that are not just shorter but also smarter.
- Self-Correcting Memory: Future AI systems might have “meta-memory” that allows them to actively manage and refine their own memory stores. They could decide what’s important to keep, what to summarize, and what to forget, reducing human oversight.
- Multimodal Memory: As AI handles more than just text (images, audio, video), memory will need to adapt. Imagine an AI remembering visual cues or audio tones from past interactions to better understand new ones.
- Personalized Memory Graphs: Instead of just linear conversations, memory might be structured as complex “knowledge graphs” unique to each user. This would allow for richer, more deeply personalized AI interactions that adapt over long periods.
- Ephemeral and Persistent Layers: More sophisticated memory systems will likely combine fast, ephemeral memory for immediate context with slower, persistent memory for long-term knowledge, seamlessly switching between them.
The journey with LangChain memory is continuous. The tools and techniques discussed in this langchain memory tutorial 2026 provide a strong foundation. By understanding the core concepts and keeping an eye on future developments, you’ll be well-equipped to build the next generation of intelligent AI agents.
Conclusion
You’ve now completed a comprehensive langchain memory tutorial 2026, exploring the vital role of memory in AI conversations. We started with simple concepts, like ConversationBufferMemory implementation, and moved through ConversationSummaryMemory for long conversations and ConversationBufferWindowMemory usage. You also learned how to build custom memory classes and integrate memory with vector stores for deeper knowledge recall.
Understanding production memory patterns and cost optimization for memory is crucial for deploying real-world applications. These skills will help you create AI systems that are not only smart but also efficient and sustainable. The ability to give an AI the power to remember transforms it from a simple tool into a true conversational partner.
Experiment with these different memory types and see how they change your AI’s behavior. LangChain provides a flexible framework, so don’t be afraid to combine different approaches or create your own unique solutions. The future of AI is exciting, and mastering memory is a key step in building innovative agents that can truly understand and interact with the world around them.
Leave a comment