Every LLM session you've ever had is probably gone. MemPalace is a local-first AI memory system that stores content verbatim, indexes it in a structured hierarchy, and retrieves it with 96.6% recall at R@5 on LongMemEval — running entirely on your local machine with no API key required.
Every LLM session you've ever had is probably gone. A tab closes, a context window hits its limit, and all those decisions get lost: why you picked that library, what that error actually meant, what the plan was for that module.
MemPalace is a local-first AI memory system that stores content verbatim, indexes it in a structured hierarchy, and retrieves it with 96.6% recall at R@5 on LongMemEval. It runs entirely on your local machine with no API key required. It's open source and MIT-licensed.
Most AI memory systems summarize. MemPalace doesn't. It keeps your content as raw text (files, transcripts, messages, notes) and indexes those chunks for retrieval. Summaries throw away the details you'll want when debugging at 11pm, or when a decision stops making sense six months later. Verbatim storage means the source of truth is still there.
It's not a cloud SaaS, a monolithic agent platform, or a summarization engine. It's a memory substrate: the full history of your AI work, available on demand, on your own machine.
MemPalace organizes content in three levels:
When you mine a directory or transcript, MemPalace places the resulting drawers into this structure. Searches can be scoped to a wing or room when you know the context, which cuts noise. You're not running every query over one giant flat index.
ChromaDB is the default vector store, local and embedded. The embedding model (around 300 MB) downloads and caches locally on first use. Nothing leaves your machine unless you explicitly configure an external backend.
If you want external storage, MemPalace supports Qdrant over REST and pgvector on Postgres, both configured via connection strings and environment variables. MemPalace writes marker files to your palace directory so you don't accidentally point a palace at the wrong database after a config change.
The default setup needs no cloud infrastructure: your machine, an embedding model, and ChromaDB.
MemPalace ships benchmarks and the code to reproduce them. On LongMemEval (a 500-question long-term memory benchmark):
That 96.6% uses no API keys and no LLMs at any stage. Just local embeddings, ChromaDB, and the retrieval logic.
MemPalace also publishes results on LoCoMo, ConvoMem, and MemBench. Full per-question outputs are in the repo so you can audit or rerun the benchmarks yourself.
The retrieval layer is pluggable. Current backends:
Each backend implements the same contract, so the retrieval layer doesn't get shaped around one vendor. External backends support namespace isolation for multi-tenant use.
Beyond vector search, MemPalace includes a temporal knowledge graph built on SQLite. You can add entities and relationships with validity windows, query and traverse the graph, invalidate or update facts as they change, and build timelines around specific entities.
Use it to model "who knew what, when" or track facts that shift. "This service ran on ECS, then migrated to Kubernetes in March 2025" is the kind of thing that doesn't survive summarization. In a structured graph, it's queryable.
MemPalace is an MCP server exposing 29 tools: palace reads and writes, knowledge graph operations, cross-wing navigation, drawer management, and agent diary utilities.
Claude Code, Gemini CLI, and other MCP-compatible tools can call MemPalace as a context provider during sessions: "find the most relevant sessions about this repo," "recall what we decided about GraphQL," "show all drawers mentioning this bug ID."
Specialist agents each get their own wing and their own diary, building persistent expertise over time. They're discoverable at runtime via mempalace_list_agents rather than bloating the system prompt upfront.
Hooks are available for Claude Code, Codex, and Cursor IDE. They save conversations periodically and capture a snapshot before the host tool truncates context.
If you have old JSONL transcripts, backfill them:
mempalace mine ~/.claude/projects/ --mode convosFor per-message recall, mempalace sweep <transcript-dir> creates one drawer per message (user and assistant) in an idempotent way. That gives you message-level retrieval on top of file-level chunks.
Install into an isolated environment:
uv tool install mempalace
mempalace init ~/projects/myappMine your content and search:
# Mine content
mempalace mine ~/projects/myappIf you'd rather not install Python, there's a Docker image that runs both the CLI and the MCP server with everything persisted under /data.
Run mempalace mine against a project directory, then mempalace wake-up before your next AI session. If you've been using Claude Code or Cursor with months of history, that's the fastest way to see what MemPalace recovers that you'd written off as gone.
The repo and reproducible benchmark scripts are on GitHub.
Have a perspective on this piece? Reach out — the best writing comes from good conversation.