A few days ago we wrote about building a memory system for my AI assistant using PARA and file-level decay tracking. That was progress — but it didn't take long to spot the fundamental flaw. A file called job-search.md might contain thirty facts. Some of them are from this morning. Some are from three weeks ago. File-level decay treats all of them identically. That's not a memory system — it's a blunt instrument. So we redesigned it from the ground up to track individual facts.
What Is a Fact, Exactly?
The first design question was definitional: what's the right atomic unit? Go too granular and every adjective becomes its own entry. Go too coarse and you're back to the same problem — a compound statement that's half stale and half fresh, but gets treated as one thing.
The definition I landed on: one subject, one predicate, one claim.
The practical test is the and-test. If you find yourself writing "and" to connect two independent ideas, that's two facts. Ask: would you ever need to update one piece without the other? If the answer is yes, they're separate. If updating one always means updating the other, keep them together.
Examples:
✅ "Team demo is scheduled for Friday" — one fact
✅ "Prefers async standups over live meetings" — one fact
❌ "Currently job searching and prefers fully remote roles" — two facts crammed together
Durable vs. Ephemeral
Once you have atomic facts, the next insight is that not all facts decay the same way. Some facts are true until they're contradicted. Others are true until time passes. These need different treatment.
Durable — True until explicitly contradicted. Preferences, long-term decisions, contact details. Exempt from temperature decay — only invalidated by a newer contradicting assertion. Example: 'Prefers async standups over live meetings'
Ephemeral — Time-bounded by nature. Becomes stale when the window passes, regardless of whether anything contradicted it. Subject to temperature decay. Example: 'Team demo call is on Friday'
Durable facts don't temperature-decay, but they can go dormant: if a durable fact hasn't been referenced in 90 days, it gets flagged as possibly no longer relevant. Not deleted — just surfaced for review. That's the difference between a preference that's still true but hasn't come up lately, versus a belief that has quietly become outdated.
Four Statuses, Not Two
The original decay system had two states: hot, warm, cold. But temperature alone can't capture everything that can happen to a fact. I added an explicit status field with four values:
active — Currently true and relevant. The default.
superseded — Replaced by a newer, contradicting fact. Linked to its successor via superseded_by. The chain is preserved — you can always trace how a belief evolved.
expired — Was true, but its time window has passed. Different from superseded: nothing contradicted it, the clock just ran out.
dormant — Durable fact with zero access count for 90+ days. Still believed, just unused. Worth reviewing.
The superseded/expired distinction matters. Superseded means a new claim replaced it. Expired means it was true but its time passed. Both preserve history — neither deletes anything. Facts are never deleted, only transitioned.
Write-Ahead Log + Nightly Compaction
The trickiest design question: who creates facts, and when? We landed on both — but with different trust levels and a reconciliation step.
Inline during sessions: When something clearly factual surfaces in conversation, create the fact immediately. Don't wait for the nightly job — if the session ends abruptly or the job fails, you've lost it. These are high-confidence, user-stated facts. Fast, durable, possibly a little noisy.
Nightly batch from daily logs: The nightly review job catches what the session didn't explicitly flag — inferences, patterns, things mentioned in passing. This is also where deduplication and reconciliation happen.
# The mental model
Inline facts = write-ahead log entries (WAL)
Nightly job = compaction
# Inline says:
"this seems like a fact, save it"
# Nightly says:
"confirmed / superseded / these two are the same thing"
Every inline fact gets a reconciled: false flag. The nightly job processes the unreconciled pile, confirms or merges each entry, and flips the flag. This gives compaction a clean entry point — it only needs to process new WAL entries, not compare every fact against every other fact on every run.
Deduplication Without Semantic Search
Detecting duplicate facts — "Prefers async standups" vs. "Dislikes synchronous meetings" — ideally requires semantic similarity. Vectorized semantic search wasn't available for this purpose. So the dedup strategy has to be practical rather than ideal.
We settled on a two-layer approach:
Structural keying. Each fact is indexed by
type + subject entityat creation time. New facts only get compared against existing facts in the same bucket — dramatically smaller comparison space.LLM reasoning on flagged candidates. The nightly job runs as a Claude agent. When structural keying surfaces candidate duplicates, Claude reads them and reasons about whether they assert the same claim. Output: skip, merge, or supersede. This is Claude thinking, not vector search.
It's not perfect — pure paraphrases with no structural overlap can still slip through. But inline creation discipline (only log clearly new information) keeps the duplicate rate low enough that this approach is workable in practice.
How Do You Know It's Working?
A memory system without observable success metrics is just vibes. We defined three layers of measurement:
Health metrics — Is the system functioning correctly? WAL backlog (should be near zero after each nightly run), reconciliation rate, expiration accuracy, nightly job runtime.
Quality metrics — Is the decay signal honest? Temperature distribution should be a pyramid — some hot, more warm, most cold. If everything is hot, decay isn't running. If everything is cold, bumping is broken.
Outcome metrics — Is it actually helping? Is context being repeated less? Is stale information bleeding into current responses? These are harder to measure but matter most.
A weekly health digest cron posts a summary to a configured notification channel. Threshold-based alerts handle acute failures without waiting for the weekly report: WAL backlog over 20, any active facts with an expired expires_after, nightly job runtime more than 2× its rolling average.
What's Next
This is a design, not a running system — yet. The spec is complete, but the file-level decay index is still what's actually running today. Building the fact-level system requires rewriting the nightly review job, creating the items.json structure across the knowledge base, and establishing the inline fact creation habit.
The gap between a good design and a working implementation is real. In Part 2, I build it.
Concepts: Write-ahead log · LRU/LFU decay · Knowledge representation · PARA method
Tools: OpenClaw · Claude Sonnet · Markdown
Next: Part 2: Building the Engine →
Originally published at https://www.paulbrennaman.me/lab/fact-level-memory-decay

