The longer my AI assistant has been running, the smarter it should get — more context, more history, more remembered preferences. The problem is that "more memory" only helps if it can actually find what it stored. I hit a wall where my assistant was clearly forgetting things that were written down, and I went digging for a fix.
The Problem with Default Memory Search
OpenClaw stores memory as plain Markdown files — a daily log, a long-term MEMORY.md, whatever you write to disk. The built-in search indexes those files in SQLite and does its best, but it's largely keyword-based. That means if you search for "gateway server setup" and your note says "run the gateway on the Mac Mini in the closet," you get nothing. The words don't match, so the memory doesn't surface.
As memory grows, this gets worse. Weeks of daily notes, preferences, project decisions — all technically there, increasingly unfindable.
Enter QMD
QMD (Query Markup Documents) is a local search engine for Markdown files built by Tobi Lütke (founder of Shopify). It combines three search strategies:
BM25 full-text search — fast keyword matching, great for exact terms, IDs, error messages
Vector semantic search — finds conceptually similar content even when wording differs
LLM re-ranking — merges both result sets and re-ranks them with a local model for best quality
The whole thing runs locally via node-llama-cpp with small GGUF models. No API keys, no cloud, no data leaving your machine. OpenClaw already supports QMD as a first-class memory backend — you just have to turn it on.
Important caveat: layers 2 and 3 require a CUDA-capable GPU to be usable in practice. On CPU-only hardware, only BM25 is viable. More on this below.
One claimed benefit: 95%+ reduction in token usage, because retrieval moves off the LLM and onto your machine. Instead of stuffing large chunks of memory into context hoping the model finds the relevant bit, QMD returns only the right snippets.
The Setup
OpenClaw natively supports QMD as a drop-in memory backend. Here's how to enable it.
1. Install QMD
Install the QMD CLI globally via npm (or bun if you have it):
npm install -g @tobilu/qmd --prefix ~/.npm-global
Verify it installed:
~/.npm-global/bin/qmd --version
# qmd 1.0.7
2. Update OpenClaw config
Add a memory section to ~/.openclaw/openclaw.json:
"memory": {
"backend": "qmd",
"qmd": {
"includeDefaultMemory": true,
"update": { "interval": "5m" },
"limits": { "maxResults": 6 },
"scope": {
"default": "deny",
"rules": [
{ "action": "allow", "match": { "chatType": "direct" } }
]
}
}
}
The scope config is important — it restricts QMD memory results to direct/private chats only, so your personal memory doesn't leak into group conversations.
3. Restart the gateway
openclaw gateway restart
The Gotcha: PATH
After restarting, we checked the logs and found this repeating:
qmd collection add failed: spawn qmd ENOENT
OpenClaw couldn't find the qmd binary. The issue: the gateway runs as a systemd service with its own PATH environment variable, and it didn't include the directory where npm installed the binary.
The fix is making sure you install QMD into a directory that's already on the gateway's PATH. Check your gateway service PATH with:
cat ~/.config/systemd/user/openclaw-gateway.service | grep PATH
Then install QMD into a directory from that list — in my case ~/.npm-global/bin was already included, so using --prefix ~/.npm-global during install put the binary exactly where the gateway could find it.
Verifying It Works
Once installed correctly, the gateway logs on restart should look clean — no ENOENT errors, just:
qmd memory startup initialization armed for agent "main"
And you can confirm QMD is actively running:
ps aux | grep qmd
# node .../qmd.js embed ← indexing your memory files
The first-time setup downloads about 2GB of local GGUF models (embedding, reranker, query expansion) — one-time cost, happens automatically in the background.
A Note on Graceful Fallback
One thing I appreciated: if QMD isn't working (binary missing, crash, whatever), OpenClaw silently falls back to the built-in SQLite search. Nothing breaks. You just don't get the upgraded search until it's fixed. That made the whole setup process low-risk — I could debug without losing memory functionality in the meantime.
Hardware Reality Check
After getting QMD running, we wanted to verify that all three search layers actually worked — not just assume they did because the setup completed. Here's what we found on my server (GCP e2-medium: 2 vCPUs, 4 GB RAM, no GPU):
Layer 1 — BM25 ✅ — Fast, lightweight, no model needed. Returns results in milliseconds. This is what OpenClaw uses by default.
Layer 2 — Vector semantic search ❌ — Even qmd vsearch (advertised as 'no reranking') still loads the 1.7B query expansion model before doing vector similarity. On CPU, it pegged both cores at 100% for 15+ minutes and consumed 2.7 GB of 4 GB RAM before I killed it.
Layer 3 — LLM re-ranking ❌ — Same constraint as layer 2, but heavier. Requires the full query expansion + reranking pipeline. Completely impractical without GPU acceleration.
The root cause: QMD detects CUDA, tries to initialize it, fails (no GPU), and falls back to CPU via a try/catch in its internals. The fallback works — it doesn't crash — but CPU inference on a 1.7B model is just too slow for interactive use.
What hardware would make layers 2 & 3 work?
You need a CUDA-capable NVIDIA GPU. In cloud terms, that means something like a GCP g2-standard-4 (NVIDIA L4) or an AWS g4dn.xlarge (NVIDIA T4). A local desktop or workstation with a modern NVIDIA GPU (RTX 3060 or better) would also be more than sufficient. The models QMD uses are small enough that even a mid-range GPU handles them comfortably. The e2-medium class of instance — and any CPU-only VPS — simply isn't the right fit.
Worth It?
If you have a GPU: almost certainly yes. The full pipeline — query expansion, vector similarity, LLM re-ranking — is exactly the kind of retrieval upgrade that makes a real difference as memory grows.
If you're on CPU-only hardware like I am: the honest answer is that you're getting BM25 with extra steps. That's not nothing — QMD's BM25 indexes your files cleanly and integrates tightly with OpenClaw's memory system. But the headline feature (semantic search) won't work until you add a GPU to the picture.
We got through the setup in about 30 minutes, including working through the PATH issue. And BM25 alone is a solid, reliable memory layer — just go in with accurate expectations about what the hardware you're running on can actually deliver.
Tools: OpenClaw · QMD · node-llama-cpp · systemd
Models: embedding-gemma-300M · qwen3-reranker-0.6b · qmd-query-expansion-1.7B
Originally published at https://www.paulbrennaman.me/lab/qmd-memory-search

