- AI agents lose all context when a session ends: decisions, preferences, relationships, project history. This is "context death."
- ClawVault's LoCoMo benchmark research found plain markdown files (74%) outperformed specialized memory tools like Mem0 and Zep (68.5%). LLMs already know how to read files.
- The best memory architectures treat agent memory like human knowledge management: typed storage, associative linking, priority-based retrieval.
- Five approaches compared: ClawVault, OpenClaw built-in memory, Mem0, Zep, and raw vector databases. Each fits a different use case.
- Practical steps you can implement today: organize by type, tag by priority, build a vault index, and use budget-aware context injection.
You spend forty minutes teaching your AI agent about a project. You explain the architecture, the constraints, the decisions you already made and why. The agent gets it. It produces excellent work. Then the session ends.
The next morning, you open a new session. The agent has no idea who you are.
Every decision you explained? Gone. Every preference it learned? Gone. The context that made it useful? Completely erased. You are starting from zero. Again.
If you run AI agents in production, you have lived this. It is the single most frustrating limitation of working with autonomous systems, and it has a name: context death.
What Context Death Actually Costs You
Context death is not just annoying. It is expensive.
Consider a typical production agent workflow. An agent helps you manage a software project. Over several sessions, it learns your coding standards, your deployment preferences, the architecture decisions you made six weeks ago. It knows that Sarah prefers REST APIs, that the PostgreSQL migration has a deadline, and that the staging environment breaks if you deploy before running integration tests.
Then the context window resets.
Now multiply that lost context across every agent interaction, every day, across an entire team. The decisions that get revisited because nobody remembers the original reasoning. The inconsistencies that creep in when an agent makes a recommendation without knowing what was already tried and rejected.
A WIRED article from this week captured it perfectly: an OpenClaw agent became "hilariously amnesiac, repeatedly informing me that its context had gotten nuked and asking what we were doing, like a cheerful version of the main character in Memento."
That is the state of the art right now. Brilliant agents with zero long-term memory.
The Benchmark That Changed the Conversation
In early 2026, Pedro from Versatly (@sillydarket on X) published research that caught the AI engineering community off guard. His team ran memory architecture benchmarks on LoCoMo, a standard evaluation for long-conversation memory, and the results contradicted what most people assumed.
Plain markdown files, stored locally and injected into context, scored 74% on the benchmark. Specialized memory tools like Mem0 and Zep, with their vector databases and embedding pipelines, scored 68.5%.
The simpler approach won. By a meaningful margin.
The post went viral: 289K views and 817 likes in days. Engineers who had built complex memory infrastructure started questioning their stack. The insight was both obvious in hindsight and genuinely surprising: LLMs already know how to work with files. They have been trained on billions of markdown documents, YAML configs, and structured text. You do not need a specialized retrieval system to help a language model read a well-organized file.
"The agent memory problem isn't a technology problem. It's a design problem."Pedro, Versatly (@sillydarket)
This does not mean vector databases are useless. For massive-scale retrieval across millions of documents, embeddings are essential. But for practical agent memory, where you need an agent to remember a few hundred decisions and project details, structured files outperform complex tooling.
Five Approaches to Agent Memory, Compared
The agent memory space is moving fast. Here are the five main approaches practitioners are using today, with honest assessments of each.
ClawVault (Versatly)
Obsidian-style markdown vault with knowledge graphHow it works: Typed markdown files with YAML frontmatter, wiki-links between concepts, and a graph-aware context retrieval system. Observational memory automatically extracts decisions, preferences, and lessons from conversations. Semantic search via qmd with BM25 + vector + neural reranker.
Strengths: Open source, local-first, zero cloud dependency, Obsidian-compatible, framework-agnostic. Works with OpenClaw hooks or standalone. npm install -g clawvault
Limitations: Requires Node.js 18+ and qmd. Newer project, smaller community than managed alternatives.
OpenClaw Built-in Memory
MEMORY.md + structured memory files + semantic searchHow it works: Agents read from MEMORY.md and memory/*.md files at session start. Semantic search via memory_search finds relevant context. Heartbeat-driven context injection keeps agents aware across long sessions. Files are plain markdown, editable by humans.
Strengths: Zero setup if you already use OpenClaw. Human-readable. Integrates with workspace context injection. ClawVault hooks extend it further.
Limitations: Manual organization required without ClawVault. No built-in knowledge graph traversal.
Mem0
Managed memory layer with API and vector storageHow it works: API-based memory service. You send conversation data, Mem0 extracts and stores memories as vector embeddings. Retrieval via API calls with natural language queries.
Strengths: Easy integration, managed infrastructure, works across frameworks.
Limitations: Cloud dependency, API costs, vendor lock-in. Scored lower than plain files on LoCoMo benchmark (68.5% vs 74%).
Zep
Long-term memory with knowledge graphs for AI assistantsHow it works: Combines vector storage with knowledge graph extraction. Automatically builds entity relationships from conversations. Supports temporal awareness (when things were discussed).
Strengths: Knowledge graph approach, entity extraction, temporal queries.
Limitations: Cloud-hosted, more complex setup, similar benchmark performance to Mem0.
Raw Vector Databases
Pinecone, Weaviate, Chroma, and similarHow it works: Store embeddings of conversation chunks. Query by semantic similarity. Build your own retrieval pipeline on top.
Strengths: Maximum flexibility, scales to millions of documents, production-grade infrastructure.
Limitations: Significant engineering overhead. You build everything: chunking, embedding, retrieval logic, relevance scoring. Overkill for most agent memory use cases.
The Obsidian Insight: Human Knowledge Management and Agent Memory Are the Same Problem
Think about how you organize your own knowledge. You do not dump everything into one giant text file. You create folders by topic. You link related ideas together. You tag things by importance. You have an intuitive sense of what is critical (the database migration deadline) versus what is background noise (the color of the loading spinner).
Effective agent memory works the same way. The reason plain markdown outperforms vector databases on LoCoMo is that structured files mirror how knowledge actually works: organized, linked, and prioritized.
ClawVault leans into this directly. It uses Obsidian-style wiki-links ([[PostgreSQL]], [[Sarah]]) to create a traversable knowledge graph. When an agent needs context about the database decision, it does not just find one document. It follows links to related decisions, the people involved, the constraints that shaped the choice, and the commitments that followed from it.
This is associative memory. It is how human brains work. And it turns out, it is how LLMs work best too.
Memory Types Matter More Than Memory Volume
Not all memories are equal. A decision made six weeks ago about your database architecture is fundamentally different from a note about someone's coffee preference. Treating them the same, which is what most vector databases do, wastes context budget on low-value information and sometimes misses high-value context entirely.
ClawVault introduces typed memory categories:
- Decisions: Choices made, the reasoning behind them, and the alternatives that were rejected
- Preferences: How specific people or systems prefer things done
- Relationships: Who works with whom, reporting structures, communication patterns
- Commitments: Promises made, deadlines agreed to, deliverables expected
- Lessons: Things that went wrong and what was learned
- Milestones: What was shipped and when
Typing enables precise queries. Instead of "search for anything related to databases," you can ask: "show me all decisions from the last month." Instead of hoping the vector similarity catches a relevant commitment, you can directly query the commitments folder for anything with a deadline this week.
How Context Injection Actually Works
Having a memory vault is only half the problem. The other half is getting the right memories into the agent's context window at the right time. This is where most implementations fail.
The naive approach is to dump everything into the prompt. This works until your vault exceeds the context window, which happens faster than you expect. A production agent working on a real project can easily generate hundreds of memory entries in a few weeks.
The better approach is budget-aware context injection. Here is how it works:
1. Set a context budget. Decide how many tokens you can allocate to memory injection. For most models, 4,000 to 8,000 tokens is practical without crowding out the actual task.
2. Load critical memories first. Anything tagged as critical priority gets loaded unconditionally. These are active commitments, key architectural decisions, known failure modes. If your vault is well-maintained, critical items should be a small percentage of total memories.
3. Fill remaining budget with notable items. After critical memories are loaded, fill the remaining context budget with notable-priority items, sorted by relevance to the current task. This is where semantic search earns its value: matching the current query against notable memories to find the most relevant ones.
4. Background items stay in the vault. Low-priority background memories do not get injected by default. They are available if the agent explicitly searches for them, but they do not consume context budget.
Building Your Agent's Memory System: A Practical Guide
Whether you use ClawVault, OpenClaw's built-in memory, or roll your own solution, the design principles are the same. Here is what to implement.
1. Organize Memory by Type
Create a folder structure that separates memory types:
memory/
decisions/
preferences/
lessons/
commitments/
relationships/
projects/
Each file is a markdown document with YAML frontmatter for metadata: creation date, priority level, related entities, and tags. This structure is compatible with Obsidian, meaning you can browse and edit your agent's memory with a human knowledge management tool.
2. Use Priority Tagging
Every memory entry gets a priority: critical, notable, or background.
- Critical: Active deadlines, architectural decisions affecting current work, known failure modes. These load automatically every session.
- Notable: Important context that is relevant to some tasks but not all. These load based on relevance matching.
- Background: Historical context, completed items, minor preferences. These stay in the vault unless explicitly retrieved.
3. Build a Vault Index
Create an index file that maps entity names to memory locations. When your agent encounters a reference to "the PostgreSQL decision" or "Sarah's API preferences," the index provides a direct path to the relevant file instead of requiring a full search.
ClawVault builds this automatically with its knowledge graph. If you are rolling your own, a simple YAML index works:
entities:
postgresql:
- decisions/database-choice.md
- commitments/migration-deadline.md
sarah:
- preferences/api-style.md
- relationships/frontend-team.md
4. Let Agents Traverse Links
Wiki-links between memory files are not decorative. They are the mechanism that gives agents associative recall. When an agent loads a decision about PostgreSQL, links to the migration deadline, the team member who proposed it, and the alternative that was rejected provide a complete picture.
Without links, an agent gets isolated facts. With links, it gets context.
5. Implement Session Lifecycle Hooks
The most common way agents lose memory is not through bugs. It is through the gap between sessions. Implement three lifecycle hooks:
- Wake: Load critical memories and recent checkpoint data when a session starts
- Checkpoint: Periodically save current working context during long sessions
- Sleep: Compress and store the session's key outcomes before ending
ClawVault provides all three as CLI commands (clawvault wake, clawvault checkpoint, clawvault sleep). If you are building for an OpenClaw agent team, the hook integration automates this entirely: checkpoints happen before context resets, and wake runs automatically at session start.
What Comes Next
Agent memory is still early. The approaches described here work today, in production, but they are first-generation solutions. Several trends are worth watching:
Observational memory is the shift from manually saving memories to having agents automatically extract important information from conversations. ClawVault's observer does this now, compressing session transcripts into scored observations with type tags and importance ratings. Expect this to become standard.
Cross-agent memory sharing will matter as multi-agent systems become common. When Agent A makes a decision that affects Agent B's work, the memory system needs to propagate that. Shared vaults with scoped access controls are one approach.
Memory compression will grow more important as vaults grow. Raw transcripts accumulate fast. ClawVault's approach of extracting typed observations from raw text, then keeping the raw text as a ledger for replay, is a pattern worth studying.
Temporal awareness in memory retrieval, knowing not just what was decided but when and whether it is still current, will separate production-grade memory systems from toys.
Start Here
If you are running AI agents and have not addressed context death, start today. The investment is small and the payoff is immediate.
- Create a
memory/folder with type-based subfolders - Tag your three most critical decisions, preferences, and lessons as critical priority
- Set up your agent to load critical memories at session start
- Add checkpoint saves for long sessions
- Review and prune weekly, just like you would with your own notes
The tools are available now. ClawVault is open source at github.com/Versatly/clawvault. OpenClaw has built-in memory support. You do not need permission or a budget to fix this.
Your agent's memory problem is a design problem. Design the solution, and your agent stops being a brilliant amnesiac and starts being a colleague that learns.
Related: How to Build an AI Agent Team with OpenClaw | OpenClaw Beginner Setup Guide | AI Agents Getting Memory: Why Persistent Context Changes Everything