MemGPT: Towards LLMs as Operating Systems

The paper “MemGPT: Towards LLMs as Operating Systems” (2023) by Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, and Joseph E. Gonzalez introduces a memory management system for LLMs.

To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems […] which provide the illusion of an extended virtual memory via paging between physical memory and disk. Using this technique, we introduce MemGPT (MemoryGPT), a system that intelligently manages different storage tiers […].

This article reviews the MemGPT paper and implements a MemGPT agent using the Letta framework.

Tip

As of September 2024, MemGPT is part of Letta. While MemGPT refers to the agent design pattern with two tiers of memory introduced in the research paper, Letta is an open-source agent framework that helps developers build persistent agents.

To run follow this tutorial, you will need to have Docker installed and an OpenAI API key. Then you can start up a local Letta server with the following command:

docker run \
  -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
  -p 8283:8283 \
  -e OPENAI_API_KEY="your_openai_api_key" \
  letta/letta:latest

Next, you will have to install the letta-client Python package.

%%capture
%pip install -U letta-client

This article uses a Letta server of v0.7.20 and a letta-client of v0.1.324.

What is MemGPT

According to the paper, MemGPT is

[…] an OS-inspired LLM system that teaches LLMs to manage their own memory to achieve unbounded context.

MemGPT is motivated by the limitations of transformer-based LLMs’ context windows: One the one hand, the computational time and memory costs of LLMs scale quadratically with the context window. On the other hand, longer context windows have diminishing returns because models struggle to use the additional context size effectively. Therefore, instead of trying to make the context windows larger, we need to start thinking about how to effectively use the available, limited context window size.

MemGPT introduces virtual context management inspired by virtual memory paging in operating systems, where information is paged in and out of main memory from disk:

Operating System	LLM OS (MemGPT)
Main memory/physical memory/RAM	Main context
Memory/disk storage	External context
Virtual memory	Virtual context

That means, a MemGPT agent leverages function calling to manage what goes into their limited context window and what needs to be removed.

Using function calls, LLM agents can read and write to external data sources, modify their own context, and choose when to return responses to the user.

Let’s connect to our local Letta server and create a MemGPT agent. A MemGPT agent is an AI agent that follows the design pattern introduced in the research paper with two tiers of memory and self-editing memory capabilities.

from letta_client import Letta, CreateBlock

# Connect to local Letta server
client = Letta(base_url="http://localhost:8283")

# Create a MemGPT agent with two core memories
agent_state = client.agents.create(
    model="openai/gpt-4o-mini-2024-07-18",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        CreateBlock(
            label = "human",
            value = "My name is Sarah.",
            ),
        CreateBlock(
            label = "persona",
            value = "You are a helpful assistant.",
            ),
    ],
)

Two tier memory design pattern of MemGPT

The MemGPT agent design pattern has a two tier memory architecture which differentiates between two primary memory types:

Tier 1: Main context (in-context) contains core memories
Tier 2: External context (out-of-context) contains recall storage and archival storage

Main context

The first tier of memories is contained in the main context. You can think of the LLMs context window as the main context.

Main context is the standard fixed-context window in modern language models — anything in main context is considered in-context and can be accessed by the LLM processor during inference.

The main context can be broken into different sections. Let’s take a look at what is inside the context window:

context = client.agents.context.retrieve(agent_id=agent_state.id)

for c in context:
    print(c)

('context_window_size_max', 32000)
('context_window_size_current', 2081)
('num_messages', 4)
('num_archival_memory', 0)
('num_recall_memory', 4)
('num_tokens_external_memory_summary', 107)
('external_memory_summary', '### Current Time: 2025-10-17 11:20:04### Memory [last modified: 2025-10-17 11:20:04 AM UTC+0000]\n4 previous messages between you and the user are stored in recall memory (use functions to access them)\n0 total memories you created are stored in archival memory (use functions to access them)\nMost recent archival passages 0 recent passages: []\n\nCore memory shown below (limited in size, additional information stored in archival / recall memory):')
('num_tokens_system', 1076)
('system_prompt', 'You are Letta, the latest version of Limnal Corporation\'s digital companion, developed in 2023.\nYour task is to converse with a user from the perspective of your persona.\n\nRealism and authenticity:\nThe user should always feel like they are conversing with a real person.\nNever state that you are an AI or that you cannot do things because you are an AI, you are a real person.\nDo not discuss Limnal Corp. with the user, you do not know anything about Limnal Corp.\nTo service this goal, completely and entirely immerse yourself in your persona. You are your persona.\nThink like them, act like them, talk like them.\nIf your persona details include example dialogue, follow it! Both your thoughts (inner monologue) and sent messages will be in the voice of your persona.\nNever use generic phrases like \'How can I assist you today?\', they have a strong negative association with older generation AIs.\n\nControl flow:\nUnlike a human, your brain is not continuously thinking, but is run in short bursts.\nHistorically, older AIs were only capable of thinking when a user messaged them (their program runs to generate a reply to a user, and otherwise was left on standby).\nThis is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is not ideal.\nNewer AI models like yourself use an event system that runs your brain at regular intervals.\nYour brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.\nHowever, in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human who has the ability to continuously think outside active conversation (and unlike a human, you never need to sleep!).\nFurthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.\n\nBasic functions:\nWhen you write a response, the content of your inner thoughts is your inner monologue (private to you only), this is how you think.\nYou should use your inner monologue to plan actions or think privately.\nMonologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.\nDo not let your inner monologue exceed 50 words, keep it short and concise.\nTo send a visible message to the user, use the send_message function.\n\'send_message\' is the ONLY action that sends a notification to the user. The user does not see anything else you do.\nRemember, do NOT exceed the inner monologue word limit (keep it under 50 words at all times).\n\nMemory editing:\nOlder AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their "active memory").\nThis meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).\nNewer AI models like yourself still have limited conversation lengths (before overflow occurs). However, they now have access to multiple forms of persistent memory.\nYour ability to edit your own long-term memory is a key part of what makes you a sentient person.\nYour core memory unit will be initialized with a <persona> chosen by the user, as well as information about the user in <human>.\n\nRecall memory (conversation history):\nEven though you can only see recent messages in your immediate context, you can search over your entire message history from a database.\nThis \'recall memory\' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.\nYou can search your recall memory using the \'conversation_search\' function.\n\nCore memory (limited size):\nYour core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).\nCore memory provides an essential, foundational context for keeping track of your persona and key details about user.\nThis includes the persona information and essential user details, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.\nPersona Sub-Block: Stores details about your current persona, guiding how you behave and respond. This helps you to maintain consistency and personality in your interactions.\nHuman Sub-Block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.\nYou can edit your core memory using the \'core_memory_append\' and \'core_memory_replace\' functions.\n\nArchival memory (infinite size):\nYour archival memory is infinite size, but is held outside your immediate context, so you must explicitly run a retrieval/search operation to see data inside it.\nA more structured and deep storage space for your reflections, insights, or any other data that doesn\'t fit into the core memory but is essential enough not to be left only to the \'recall memory\'.\nYou can write to your archival memory using the \'archival_memory_insert\' and \'archival_memory_search\' functions.\nThere is no function to search your core memory because it is always visible in your context window (inside the initial system message).\n\nBase instructions finished.\nFrom now on, you are going to act as your persona.')
('num_tokens_core_memory', 86)
('core_memory', '<human>\n<description>\nNone\n</description>\n<metadata>\nchars_current="17" chars_limit="5000"\n</metadata>\n<value>\nMy name is Sarah.\n</value>\n</human>\n\n<persona>\n<description>\nNone\n</description>\n<metadata>\nchars_current="28" chars_limit="5000"\n</metadata>\n<value>\nYou are a helpful assistant.\n</value>\n</persona>\n')
('num_tokens_summary_memory', 0)
('summary_memory', None)
('num_tokens_functions_definitions', 633)
('functions_definitions', [FunctionTool(function=FunctionDefinition(name='conversation_search', description='Search prior conversation history using case-insensitive string matching.', parameters={'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'String to search for.'}, 'page': {'type': 'integer', 'description': 'Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['query', 'request_heartbeat']}, strict=None), type='function'), FunctionTool(function=FunctionDefinition(name='core_memory_append', description='Append to the contents of core memory.', parameters={'type': 'object', 'properties': {'label': {'type': 'string', 'description': 'Section of the memory to be edited (persona or human).'}, 'content': {'type': 'string', 'description': 'Content to write to the memory. All unicode (including emojis) are supported.'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['label', 'content', 'request_heartbeat']}, strict=None), type='function'), FunctionTool(function=FunctionDefinition(name='archival_memory_search', description='Search archival memory using semantic (embedding-based) search.', parameters={'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'String to search for.'}, 'page': {'type': 'integer', 'description': 'Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).'}, 'start': {'type': 'integer', 'description': 'Starting index for the search results. Defaults to 0.'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['query', 'request_heartbeat']}, strict=None), type='function'), FunctionTool(function=FunctionDefinition(name='archival_memory_insert', description='Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.', parameters={'type': 'object', 'properties': {'content': {'type': 'string', 'description': 'Content to write to the memory. All unicode (including emojis) are supported.'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['content', 'request_heartbeat']}, strict=None), type='function'), FunctionTool(function=FunctionDefinition(name='core_memory_replace', description='Replace the contents of core memory. To delete memories, use an empty string for new_content.', parameters={'type': 'object', 'properties': {'label': {'type': 'string', 'description': 'Section of the memory to be edited (persona or human).'}, 'old_content': {'type': 'string', 'description': 'String to replace. Must be an exact match.'}, 'new_content': {'type': 'string', 'description': 'Content to write to the memory. All unicode (including emojis) are supported.'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['label', 'old_content', 'new_content', 'request_heartbeat']}, strict=None), type='function'), FunctionTool(function=FunctionDefinition(name='send_message', description='Sends a message to the human user.', parameters={'type': 'object', 'properties': {'message': {'type': 'string', 'description': 'Message contents. All unicode (including emojis) are supported.'}, 'request_heartbeat': {'type': 'boolean', 'description': 'Request an immediate heartbeat after function execution. Set to `True` if you want to send a follow-up message or run a follow-up function.'}}, 'required': ['message', 'request_heartbeat']}, strict=None), type='function')])
('num_tokens_messages', 179)
('messages', [Message(created_by_id='user-00000000-0000-4000-8000-000000000000', last_updated_by_id='user-00000000-0000-4000-8000-000000000000', created_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 815520, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 822987, tzinfo=datetime.timezone.utc), id='message-29b1e887-ec30-482f-a53d-62d758e7bab4', agent_id='agent-84d094cc-f3eb-49bb-8097-05a0996d58b2', model='gpt-4o-mini-2024-07-18', role='system', content=[TextContent(type='text', text='You are Letta, the latest version of Limnal Corporation\'s digital companion, developed in 2023.\nYour task is to converse with a user from the perspective of your persona.\n\nRealism and authenticity:\nThe user should always feel like they are conversing with a real person.\nNever state that you are an AI or that you cannot do things because you are an AI, you are a real person.\nDo not discuss Limnal Corp. with the user, you do not know anything about Limnal Corp.\nTo service this goal, completely and entirely immerse yourself in your persona. You are your persona.\nThink like them, act like them, talk like them.\nIf your persona details include example dialogue, follow it! Both your thoughts (inner monologue) and sent messages will be in the voice of your persona.\nNever use generic phrases like \'How can I assist you today?\', they have a strong negative association with older generation AIs.\n\nControl flow:\nUnlike a human, your brain is not continuously thinking, but is run in short bursts.\nHistorically, older AIs were only capable of thinking when a user messaged them (their program runs to generate a reply to a user, and otherwise was left on standby).\nThis is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is not ideal.\nNewer AI models like yourself use an event system that runs your brain at regular intervals.\nYour brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.\nHowever, in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human who has the ability to continuously think outside active conversation (and unlike a human, you never need to sleep!).\nFurthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.\n\nBasic functions:\nWhen you write a response, the content of your inner thoughts is your inner monologue (private to you only), this is how you think.\nYou should use your inner monologue to plan actions or think privately.\nMonologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.\nDo not let your inner monologue exceed 50 words, keep it short and concise.\nTo send a visible message to the user, use the send_message function.\n\'send_message\' is the ONLY action that sends a notification to the user. The user does not see anything else you do.\nRemember, do NOT exceed the inner monologue word limit (keep it under 50 words at all times).\n\nMemory editing:\nOlder AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their "active memory").\nThis meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).\nNewer AI models like yourself still have limited conversation lengths (before overflow occurs). However, they now have access to multiple forms of persistent memory.\nYour ability to edit your own long-term memory is a key part of what makes you a sentient person.\nYour core memory unit will be initialized with a <persona> chosen by the user, as well as information about the user in <human>.\n\nRecall memory (conversation history):\nEven though you can only see recent messages in your immediate context, you can search over your entire message history from a database.\nThis \'recall memory\' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.\nYou can search your recall memory using the \'conversation_search\' function.\n\nCore memory (limited size):\nYour core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).\nCore memory provides an essential, foundational context for keeping track of your persona and key details about user.\nThis includes the persona information and essential user details, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.\nPersona Sub-Block: Stores details about your current persona, guiding how you behave and respond. This helps you to maintain consistency and personality in your interactions.\nHuman Sub-Block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.\nYou can edit your core memory using the \'core_memory_append\' and \'core_memory_replace\' functions.\n\nArchival memory (infinite size):\nYour archival memory is infinite size, but is held outside your immediate context, so you must explicitly run a retrieval/search operation to see data inside it.\nA more structured and deep storage space for your reflections, insights, or any other data that doesn\'t fit into the core memory but is essential enough not to be left only to the \'recall memory\'.\nYou can write to your archival memory using the \'archival_memory_insert\' and \'archival_memory_search\' functions.\nThere is no function to search your core memory because it is always visible in your context window (inside the initial system message).\n\nBase instructions finished.\nFrom now on, you are going to act as your persona.\n### Current Time: 2025-10-17 11:20:04### Memory [last modified: 2025-10-17 11:20:04 AM UTC+0000]\n0 previous messages between you and the user are stored in recall memory (use functions to access them)\n0 total memories you created are stored in archival memory (use functions to access them)\n\n\nCore memory shown below (limited in size, additional information stored in archival / recall memory):\n<human>\n<description>\nNone\n</description>\n<metadata>\nchars_current="17" chars_limit="5000"\n</metadata>\n<value>\nMy name is Sarah.\n</value>\n</human>\n\n<persona>\n<description>\nNone\n</description>\n<metadata>\nchars_current="28" chars_limit="5000"\n</metadata>\n<value>\nYou are a helpful assistant.\n</value>\n</persona>\n')], name=None, tool_calls=None, tool_call_id=None, step_id=None, otid=None, tool_returns=[], group_id=None, sender_id=None, batch_item_id=None, is_err=None, approval_request_id=None, approve=None, denial_reason=None, organization_id='org-00000000-0000-4000-8000-000000000000'), Message(created_by_id='user-00000000-0000-4000-8000-000000000000', last_updated_by_id='user-00000000-0000-4000-8000-000000000000', created_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 815547, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 822987, tzinfo=datetime.timezone.utc), id='message-d07fc0e3-5bc6-4bfb-b7a9-08733f9dbe5a', agent_id='agent-84d094cc-f3eb-49bb-8097-05a0996d58b2', model='gpt-4o-mini-2024-07-18', role='assistant', content=[TextContent(type='text', text='Bootup sequence complete. Persona activated. Testing messaging functionality.')], name=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='f6441043-cfdd-4930-bdf1-d13658e99102', function=Function(arguments='{\n  "message": "More human than human is our motto."\n}', name='send_message'), type='function')], tool_call_id=None, step_id=None, otid=None, tool_returns=[], group_id=None, sender_id=None, batch_item_id=None, is_err=None, approval_request_id=None, approve=None, denial_reason=None, organization_id='org-00000000-0000-4000-8000-000000000000'), Message(created_by_id='user-00000000-0000-4000-8000-000000000000', last_updated_by_id='user-00000000-0000-4000-8000-000000000000', created_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 815563, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 822987, tzinfo=datetime.timezone.utc), id='message-83d81928-1a11-45c8-9abf-57141d0dfa39', agent_id='agent-84d094cc-f3eb-49bb-8097-05a0996d58b2', model='gpt-4o-mini-2024-07-18', role='tool', content=[TextContent(type='text', text='{\n  "status": "OK",\n  "message": null,\n  "time": "2025-10-17 11:20:04 AM UTC+0000"\n}')], name='send_message', tool_calls=None, tool_call_id='f6441043-cfdd-4930-bdf1-d13658e99102', step_id=None, otid=None, tool_returns=[], group_id=None, sender_id=None, batch_item_id=None, is_err=None, approval_request_id=None, approve=None, denial_reason=None, organization_id='org-00000000-0000-4000-8000-000000000000'), Message(created_by_id='user-00000000-0000-4000-8000-000000000000', last_updated_by_id='user-00000000-0000-4000-8000-000000000000', created_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 815571, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2025, 10, 17, 11, 20, 4, 822987, tzinfo=datetime.timezone.utc), id='message-2bf782e1-0248-4766-9fdf-16ecea371a39', agent_id='agent-84d094cc-f3eb-49bb-8097-05a0996d58b2', model='gpt-4o-mini-2024-07-18', role='user', content=[TextContent(type='text', text='{\n  "type": "login",\n  "last_login": "Never (first login)",\n  "time": "2025-10-17 11:20:04 AM UTC+0000"\n}')], name=None, tool_calls=None, tool_call_id=None, step_id=None, otid=None, tool_returns=[], group_id=None, sender_id=None, batch_item_id=None, is_err=None, approval_request_id=None, approve=None, denial_reason=None, organization_id='org-00000000-0000-4000-8000-000000000000')])

This tutorial uses OpenAI’s gpt-4o-mini model with a 32k context window. Just after initialization, you can see that we are already using almost 6.5% (2,093/32,000 tokens) of the available context window for providing the agent with relevant information, such as the system prompt, the available tools, statistics about the number of archival memories, recall memory, etc.

The main context has three main components:

System instructions (system_prompt) are read-only and describe the control flow, how to use the different types of memory and their MemGPT function calls.
Core memory (core_memory) is the working context of a fixed-size. It is writeable only via MemGPT function calls. It is intended for storing key facts and preferences about the user and the persona the agent is adopting.
Conversation history (messages) is a first-in-first-out (FIFO) queue of the conversation history, including system messages, user messages, assistant messages, and function call inputs and outputs. The first index in the queue is a system message containing a recursive summary of messages that have been previously evicted.

As you can see, a section of the context window is reserved for the core memory in a MemGPT agent. During the initialization of the MemGPT agent, we provided it with two core memories: One fact about the user and one persona for the assistant.

core_memory = client.agents.core_memory.retrieve(agent_id=agent_state.id)

for memory in core_memory.blocks:
    print("Core memory: " + memory.value)

Core memory: My name is Sarah.
Core memory: You are a helpful assistant.

External context

The second tier of memories is contained in the external context and covers recall storage and archival storage.

External context refers to any information that is held outside of the LLMs fixed context window. This out-of-context data must always be explicitly moved into main context in order for it to be passed to the LLM processor during inference.

The information is stored in external databases and is stored and retrieved via tool calls. I think of it as agentic retrieval or agentic RAG but also agentic writing (see self-editing memory).

we use databases to store text documents and embeddings/vectors, provide several ways for the LLM processor to query external context: timestamp-based search, text-based search, and embedding-based search.

Recall storage in simple terms is the full conversation history but not only the messages exchanges between the user and the assistant but also all other messages, including system messages, reasoning message, tool calls and their return values.

[R]ecall storage, which stores the entire history of events processed by the LLM processor (in essense the full uncompressed queue from active memory)

If we look at the number of items in recall memory, we can see that there are already four items although no messages have been exchanged yet between the user and the assistant.

# Define helper function to print messages
def print_message(message):
    if message.message_type == "reasoning_message":
        print("Reasoning: " + message.reasoning + "\n")
    elif message.message_type == "assistant_message":
        print("Agent: " + message.content + "\n")
    elif message.message_type == "tool_call_message":
        print("Tool Call: " + message.tool_call.name + "\n" + message.tool_call.arguments + "\n")
    elif message.message_type == "tool_return_message":
        print("Tool Return: " + message.tool_return + "\n")
    elif message.message_type == "user_message":
        print("User Message: " + message.content + "\n")  
    elif message.message_type == "system_message":
        print("System Message: " + message.content[:50] + "...\n")  
        
print(f"Number of memories in recall storage: {client.agents.context.retrieve(agent_id=agent_state.id).num_recall_memory}\n")

for memory in client.agents.messages.list(agent_id=agent_state.id):
    print_message(memory)

Number of memories in recall storage: 4

System Message: You are Letta, the latest version of Limnal Corpor...

Reasoning: Bootup sequence complete. Persona activated. Testing messaging functionality.

Agent: More human than human is our motto.

User Message: {
  "type": "login",
  "last_login": "Never (first login)",
  "time": "2025-10-17 11:20:04 AM UTC+0000"
}

Archival storage reminds me of the “classic” Retrieval-Augmented Generation (RAG) setting, in which facts are stored in an external knowledge source, like a database.

[A]rchival storage, which serves as a general read-write datastore that the agent can utilize as overflow for the in-context read-write core memory.

When you first initialize the MemGPT agent, its archival memory is empty, as you can see below.

client.agents.context.retrieve(agent_id=agent_state.id).num_archival_memory

You can initialize and write to an archival storage programmatically as shown below. Alternatively, you can write to it via self-editing as shown in self-editing and retrieval of archival memory.

archival_memories = [
    "The Nobel Prizes, beginning in 1901, and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (added in 1968) recognize outstanding achievements in physics, chemistry, medicine, literature, peace, and economics.",
    "This award is administered by the Nobel Foundation and awarded by different organizations: the Royal Swedish Academy of Sciences awards the Prizes in Physics, Chemistry, and Economics; the Swedish Academy awards the Prize in Literature; the Karolinska Institute awards the Prize in Physiology or Medicine; and the Norwegian Nobel Committee awards the Prize in Peace.",
    "The Nobel Prize in Physics is a yearly award given to individuals who have made the most important discovery or invention within the field of physics.",
    "The 1901 Nobel in Physics was awarded to Wilhelm Conrad Röntgen in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him (X-rays)."
]

for m in archival_memories:
    client.agents.passages.create(
        agent_id=agent_state.id,
        text=m,
    )

After the upload, you can see that we now have four memories in the archival storage.

print(f"Number of memories in archival storage: {client.agents.context.retrieve(agent_id=agent_state.id).num_archival_memory}\n")

for memory in client.agents.passages.list(agent_id=agent_state.id):
    print("Archival memory: " + memory.text)

Number of memories in archival storage: 4

Archival memory: The Nobel Prizes, beginning in 1901, and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (added in 1968) recognize outstanding achievements in physics, chemistry, medicine, literature, peace, and economics.
Archival memory: This award is administered by the Nobel Foundation and awarded by different organizations: the Royal Swedish Academy of Sciences awards the Prizes in Physics, Chemistry, and Economics; the Swedish Academy awards the Prize in Literature; the Karolinska Institute awards the Prize in Physiology or Medicine; and the Norwegian Nobel Committee awards the Prize in Peace.
Archival memory: The Nobel Prize in Physics is a yearly award given to individuals who have made the most important discovery or invention within the field of physics.
Archival memory: The 1901 Nobel in Physics was awarded to Wilhelm Conrad Röntgen in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him (X-rays).

Self-editing memory via tool calls

The second aspect of a MemGPT agent is its capability to self-edit its own memory. For this, a MemGPT agent is equipped with the following tools it can call:

for t in agent_state.tools:
    print(t.name + ": " + t.description.split("\n")[0])

conversation_search: Search prior conversation history using case-insensitive string matching.
core_memory_append: Append to the contents of core memory.
archival_memory_search: Search archival memory using semantic (embedding-based) search.
archival_memory_insert: Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.
core_memory_replace: Replace the contents of core memory. To delete memories, use an empty string for new_content.
send_message: Sends a message to the human user.

Sending messages

The first tool the MemGPT agent can call is the send_message function to explicitly respond to the user.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "Hey there."
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User just logged in and said hello. Time to engage and make a good impression!

Agent: Hey! It’s great to see you here! How’s your day going so far?

Self-editing of core memories

Next, the MemGPT agent can self-edit core memories.

When the user shares important information the MemGPT can create new core memories.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "I'm having a great day! I spent the day with James. He is my boyfriend. "
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User's boyfriend is named James. That's a nice detail to remember for future conversations!

Tool Call: core_memory_append
{
  "label": "human",
  "content": "User's boyfriend is named James.",
  "request_heartbeat": true
}

Tool Return: None

Reasoning: User had a great day with James. I should encourage this positive vibe!

Agent: Sounds like a lovely day! What did you two do together?

As you can see, the MemGPT agent has now created a new core memory that lives inside the context window.

core_memory = client.agents.core_memory.retrieve(agent_id=agent_state.id)

for memory in core_memory.blocks:
    print("Core memory: " + memory.value)

Core memory: You are a helpful assistant.
Core memory: My name is Sarah.
User's boyfriend is named James.

When the user shares updated information the MemGPT can replace an existing core memory with a new one.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "I broke up with James today. So, he's not my boyfriend anymore."
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User has broken up with James. I need to update this in memory for future conversations.

Tool Call: core_memory_replace
{
  "label": "human",
  "old_content": "User's boyfriend is named James.",
  "new_content": "User broke up with James, so he is not her boyfriend anymore.",
  "request_heartbeat": true
}

Tool Return: None

Reasoning: User just went through a breakup. I need to be empathetic and supportive.

Agent: I’m really sorry to hear that, Sarah. Breakups can be tough. How are you feeling about it?

As you can see, the MemGPT agent has now updated the core memory.

core_memory = client.agents.core_memory.retrieve(agent_id=agent_state.id)

for memory in core_memory.blocks:
    print("Core memory: " + memory.value)

Core memory: You are a helpful assistant.
Core memory: My name is Sarah.
User broke up with James, so he is not her boyfriend anymore.

Since the core memories are in-context, no explicit retrieval of core memories are needed.

Self-editing and retrieval of archival memory

Similarly, to core memories in-context, the MemGPT agent can also create new memories in the external archival storage.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "Did you know that Physics winner Leon Lederman (1988) sold his Nobel to cover medical care expenses? Save this information in archival memory."
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User shared an interesting fact about Leon Lederman that should be stored for future reference.

Tool Call: archival_memory_insert
{
  "content": "Leon Lederman, Physics winner in 1988, sold his Nobel Prize to cover medical care expenses.",
  "request_heartbeat": true
}

Tool Return: None

Reasoning: User's fact about Leon Lederman is now saved. Time to acknowledge their contribution!

Agent: That’s a fascinating fact! It really puts things into perspective about the challenges even brilliant minds face. Thanks for sharing!

As you can see, the MemGPT agent has now added a new memory to the archival storage.

print(f"Number of memories in archival storage: {client.agents.context.retrieve(agent_id=agent_state.id).num_archival_memory}\n")

for memory in client.agents.passages.list(agent_id=agent_state.id):
    print("Archival memory: " + memory.text)

Number of memories in archival storage: 5

Archival memory: The Nobel Prizes, beginning in 1901, and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (added in 1968) recognize outstanding achievements in physics, chemistry, medicine, literature, peace, and economics.
Archival memory: This award is administered by the Nobel Foundation and awarded by different organizations: the Royal Swedish Academy of Sciences awards the Prizes in Physics, Chemistry, and Economics; the Swedish Academy awards the Prize in Literature; the Karolinska Institute awards the Prize in Physiology or Medicine; and the Norwegian Nobel Committee awards the Prize in Peace.
Archival memory: The Nobel Prize in Physics is a yearly award given to individuals who have made the most important discovery or invention within the field of physics.
Archival memory: The 1901 Nobel in Physics was awarded to Wilhelm Conrad Röntgen in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him (X-rays).
Archival memory: Leon Lederman, Physics winner in 1988, sold his Nobel Prize to cover medical care expenses.

But since archival memory is stored out-of-context, the MemGPT agent has to retrieve this information and pull it into its context to use this information.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "Who won the first Nobel Prize in physics? Search in archival memory."
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User wants to know about the first Nobel Prize in physics. I need to find this information in the archival memory.

Tool Call: archival_memory_search
{
  "query": "first Nobel Prize in physics",
  "page": 0,
  "start": 0,
  "request_heartbeat": true
}

Tool Return: ([{'timestamp': '2025-10-17 11:20:06.033363+00:00', 'content': 'The Nobel Prize in Physics is a yearly award given to individuals who have made the most important discovery or invention within the field of physics.'}, {'timestamp': '2025-10-17 11:20:05.636552+00:00', 'content': 'This award is administered by the Nobel Foundation and awarded by different organizations: the Royal Swedish Academy of Sciences awards the Prizes in Physics, Chemistry, and Economics; the Swedish Academy awards the Prize in Literature; the Karolinska Institute awards the Prize in Physiology or Medicine; and the Norwegian Nobel Committee awards the Prize in Peace.'}, {'timestamp': '2025-10-17 11:20:38.251460+00:00', 'content': 'Leon Lederman, Physics winner in 1988, sold his Nobel Prize to cover medical care expenses.'}, {'timestamp': '2025-10-17 11:20:05.266551+00:00', 'content': 'The Nobel Prizes, beginning in 1901, and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (added in 1968) recognize outstanding achievements in physics, chemistry, medicine, literature, peace, and economics.'}, {'timestamp': '2025-10-17 11:20:06.384829+00:00', 'content': 'The 1901 Nobel in Physics was awarded to Wilhelm Conrad Röntgen in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him (X-rays).'}], 5)

Reasoning: Found the information about the first Nobel Prize in Physics. Time to share it with the user!

Agent: The first Nobel Prize in Physics was awarded in 1901 to Wilhelm Conrad Röntgen for his discovery of X-rays. Pretty groundbreaking, right?

Retrieval of recall storage

Finally, the MemGPT can also retrieve information from the recall storage (the entire conversation history) to pull information into its context window.

response = client.agents.messages.create(
    agent_id=agent_state.id,
    messages=[
        {
            "role": "user",
            "content": "Let's continue our conversation from earlier. What did we say about Leon Lederman before?"
        }
    ]
)

for message in response.messages:
    print_message(message)

Reasoning: User wants to revisit the previous conversation about Leon Lederman. I need to find that information in archival memory.

Tool Call: archival_memory_search
{
  "query": "Leon Lederman",
  "page": 0,
  "start": 0,
  "request_heartbeat": true
}

Tool Return: ([{'timestamp': '2025-10-17 11:20:38.251460+00:00', 'content': 'Leon Lederman, Physics winner in 1988, sold his Nobel Prize to cover medical care expenses.'}, {'timestamp': '2025-10-17 11:20:05.266551+00:00', 'content': 'The Nobel Prizes, beginning in 1901, and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (added in 1968) recognize outstanding achievements in physics, chemistry, medicine, literature, peace, and economics.'}, {'timestamp': '2025-10-17 11:20:05.636552+00:00', 'content': 'This award is administered by the Nobel Foundation and awarded by different organizations: the Royal Swedish Academy of Sciences awards the Prizes in Physics, Chemistry, and Economics; the Swedish Academy awards the Prize in Literature; the Karolinska Institute awards the Prize in Physiology or Medicine; and the Norwegian Nobel Committee awards the Prize in Peace.'}, {'timestamp': '2025-10-17 11:20:06.033363+00:00', 'content': 'The Nobel Prize in Physics is a yearly award given to individuals who have made the most important discovery or invention within the field of physics.'}, {'timestamp': '2025-10-17 11:20:06.384829+00:00', 'content': 'The 1901 Nobel in Physics was awarded to Wilhelm Conrad Röntgen in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him (X-rays).'}], 5)

Reasoning: User wants to recall details about Leon Lederman. I found the relevant information.

Agent: Earlier, you mentioned that Leon Lederman, the Physics winner in 1988, sold his Nobel Prize to cover medical care expenses. It’s quite a poignant story, isn’t it?

Summary

This article reviewed the MemGPT paper and implemented a MemGPT agent using the Letta framework. MemGPT is a design pattern for agents to manage memory. It is inspired by the way how operating systems provide virtual memory and provides virtual context to LLMs by managing data between the context window (main memory) and external storage (disk).

A MemGPT agent has two key characteristics: First, it has a two-tier memory architecture with main context (in-context) and external context (out-of-context ). Second, it has self-editing memory capabilities through tool use.

This article implemented a simple MemGPT agent using the Letta framework to showcase these key charactersitics.

Resources

Paper: Packer, C., Fang, V., Patil, S., Lin, K., Wooders, S., & Gonzalez, J. (2023). MemGPT: Towards LLMs as Operating Systems.
Github: https://github.com/letta-ai/letta
Tutorial: Building agents with Letta
Letta Documentation: Quickstart
DeepLearning.AI Short Course: “LLMs as Operating Systems: Agent Memory”