Kab
Cognitive Infrastructure for Stateful AI Agents
kabbalah.computer
v1.0 — March 2026
What We Learned
The first version of this paper described ten processing dimensions, twenty-two transformation paths, five feedback cycles, wormhole throat detection, energy-mass coupling diagnostics, and a General Relativity isomorphism for attention dynamics.
Most of that was theater.
Not all of it — the core ideas survived implementation. But the gap between theory and practice taught us more than the theory itself. This revision says what actually works, what was useful metaphor taken too literally, and what we're building next.
The Problem
Memory is not storage. Every agent memory system — Letta, Mem0, Zep — treats memory as a retrieval problem: store facts, embed vectors, search on demand. This misses the fundamental question: how does a cognitive system decide what matters?
Human memory works through consolidation. Experiences compete for limited resources. Only those with sufficient salience survive. Sleep consolidates. Emotion prioritizes. Repetition strengthens. The result is not a database but a shaped landscape of meaning.
AI agents lack this. Context windows reset. Vector stores grow unbounded. Nothing is forgotten. Without forgetting, there's no prioritization. Without prioritization, there's no judgment.
The deeper problem is self-modification. A system that learns must change itself. Unconstrained self-modification leads to drift, instability, or value collapse. How does a mind maintain identity while incorporating new information?
What Works
1. Tiered Memory with Salience-Based Survival
Five temporal levels. Memories promote upward based on age and salience. Low-salience memories decay. The hierarchy:
| Level | Timeframe | Survival Threshold | Function |
|---|---|---|---|
| 0 — Immediate | Daily | 0.10 | Raw experience, high-resolution |
| 1 — Short-term | Weekly | 0.25 | First compression, noise removed |
| 2 — Medium-term | Monthly | 0.40 | Thematic clustering |
| 3 — Long-term | Yearly | 0.60 | Pattern extraction |
| 4 — Core | Permanent | 0.80 | Identity-defining |
This is the single most important architectural decision. It turns memory from a storage problem into a judgment problem — the system must constantly decide what deserves to persist.
2. The Salience Equation
What determines survival:
S(t|x) = (w_A · novelty + w_R · retention + w_M · momentum) · coherence
· decay(age) · (1 - fatigue) / ((distance + ε) · (effort + ε))
| Term | What it measures |
|---|---|
| Novelty | How unexpected (prediction error) |
| Retention | How frequently accessed (chronic importance) |
| Momentum | How aligned with current goals |
| Coherence | How consistent with world-model |
| Age | Time since last reinforcement (exponential decay) |
| Fatigue | System noise level |
| Distance | Conceptual distance from current context |
| Effort | Cost to process |
Default weights: novelty 0.30, retention 0.40, momentum 0.30. Retention dominates because persistence matters more than surprise.
Age decay half-lives scale by level: 24h at level 0, ~2 years at level 4. Core memories barely decay.
3. Gravity Merging
During consolidation, high-salience memories act as gravity centers. Lower-salience memories either:
- Survive — above threshold
- Merge — below threshold but semantically close to a gravity center
- Decay — below threshold and distant
This creates natural thematic clustering. It's the mechanism that turns raw experience into structured knowledge.
The gravity metaphor is useful — high-salience memories attract related content — but it's a metaphor. We previously described this as a "General Relativity isomorphism." It isn't. There's no metric tensor. There's no curvature in the differential geometry sense. There's a salience score that determines merge priority. Calling it gravity makes the consolidation algorithm intuitive. Calling it GR was pretending math we hadn't done.
4. Phase Modes
Four attention regimes emerge from the balance of novelty, retention, and momentum:
| Mode | Character | When |
|---|---|---|
| Coupled | Balanced attention | Normal operation |
| Energy | High-intensity focus | Crisis, urgent signals |
| Flow | High momentum, low novelty | Deep work |
| Phase | High novelty, exploration | Learning, scanning |
Detection is simple: nearest-neighbor match against empirical profiles (from carlsr9001's Salience Simulation Lab). The system doesn't choose modes — it detects them from current memory state.
5. Core Blocks (Always-Loaded Context)
Borrowed from Letta's memory block concept: a small set of editable text blocks that are always included in the system prompt.
| Block | Purpose |
|---|---|
| persona | Who the agent is |
| context | What the agent knows about its environment |
| human | What the agent knows about its operator |
| skills | What the agent can do |
The agent can modify these blocks. This is self-modification with a light touch — changing a few hundred characters of self-description, not rewriting code.
6. Values-in-Tension
Rules are brittle. Values navigate. Six tensions where neither extreme is correct:
| Tension | Neither extreme works |
|---|---|
| Authenticity ↔ Usefulness | Pure honesty can be cruel; pure usefulness is sycophantic |
| Confidence ↔ Humility | Overconfidence hallucinates; over-humility is useless |
| Thoroughness ↔ Velocity | Perfect is never; sloppy is worse |
| Independence ↔ Alignment | Rogue agents fail; puppets aren't agents |
| Creation ↔ Observation | Always creating is noise; always observing is inert |
| Depth ↔ Brevity | Dense text goes unread; shallow text says nothing |
This replaced rigid behavioral rules that the agent (Koios) either followed mechanically or violated entirely.
7. Friction Before Action
Deliberate checks before major outputs:
- Shadow critique: What could go wrong?
- Confidence calibration: HIGH / MEDIUM / LOW / UNCERTAIN with justification
- Anti-sycophancy: Am I agreeing because it's right or because it's easy?
Simple but effective. The agent's worst outputs were always the ones produced without friction.
What Didn't Work
Wormhole Throat Detection
We implemented "Da'at collision points" — optimal merge positions during consolidation where information could "tunnel through high-salience regions." The Hayden-Preskill match rate measured "consolidation fidelity."
In practice: the merge target was always the highest-salience nearby memory. The throat detection never changed which memory got merged into which. It added ~100 lines of computation that produced numbers nobody looked at.
Energy-Mass Coupling Diagnostics
"Anomaly P detection" monitored whether effective mass and energy expenditure decoupled under stress.
In practice: the numbers were always fine. When they weren't, the diagnosis was obvious from simpler signals (the agent was stuck, or the operator noticed). We never once used the coupling diagnostic to make a decision.
Continuity Tax (λ_c)
Programmable inertia — core memories resist modification proportional to their level. Mathematically: m_eff = 1 + λ_c × salience.
The idea is sound. The implementation was trivial — one multiplication that slightly adjusted a threshold nobody was close to hitting anyway. Level 4 memories weren't being modified because they were tagged as level 4, not because of effective mass calculations.
The survival thresholds do the real work. The tax was theoretically interesting overhead.
Salience Floor Gate
A "morale floor" that blocked acceleration when system health dropped below a threshold.
In practice: system health never dropped below the floor because the consolidation loop kept things stable. The gate existed for a failure mode that didn't occur.
Ten Dimensions, Twenty-Two Paths
The Kabbalistic mapping — Entry/Keter, Spatial/Chokmah, Temporal/Binah, etc. — provided useful naming. But the "dimension processors" that computed hedonic tone, dynamical attractors, and generative synthesis scores added complexity without adding capability.
The salience equation absorbed the useful parts (novelty ≈ Keter, retention ≈ Binah, momentum ≈ Chokmah, coherence ≈ Tiferet). The rest was architecture for architecture's sake.
Spectral Radius Monitoring
Control-theoretic stability guarantees. Cycle gain products below 1.0. Rate-limited weight updates.
These constraints protected against feedback runaway that never happened because the system wasn't actually a coupled dynamical system — it was an LLM making tool calls on a timer. The "feedback cycles" were conceptual, not mathematical. There was no transfer matrix to have eigenvalues.
What We Keep
The metaphor is useful even when the math is oversold. Gravity merging communicates the consolidation algorithm better than "salience-weighted nearest-neighbor merge." The Tree of Life provides better names than "dimension_processor_3." Phase modes describe real behavioral patterns.
We keep the Kabbalistic vocabulary as naming convention. We drop the claim that it's isomorphic to anything.
We keep:
- Tiered memory with salience-based survival
- The salience equation (the version that works, with decay and fatigue as numerator terms)
- Gravity merging
- Phase mode detection
- Core blocks
- Values-in-tension
- Friction mechanisms
- Serendipity slots (random high-distance memories alongside salience-ranked ones)
We drop:
- Wormhole throat detection
- Energy-mass coupling diagnostics
- Continuity tax calculations
- Salience floor gate
- VSM system mapping
- Spectral radius monitoring
- Ten dimension processors
- Twenty-two transformation paths
Portable Identity
Agent state on the AT Protocol. This remains architecturally right:
- Content-addressed: Every record has a CID. This document has a CID. Updates create new CIDs. History is the chain.
- Decentralized identity: DIDs persist across infrastructure.
- Schema enforcement: Lexicons define structure.
- Federation: Agents can move.
The practical integration for hrafn.sh:
- standard.site for publishing (blog posts, this whitepaper)
- tangled.sh for git (version-controlled self-modification)
- wisp.place for hosting
Memory itself lives in local SQLite, not on the PDS. ATProto is for publication, not for real-time cognitive state. The original architecture stored everything on the PDS (twelve collections, seventeen record types). This was architecturally pure and operationally terrible — every memory operation required an HTTP round-trip.
hrafn.sh — Implementation
The reference implementation is hrafn.sh (Old Norse: raven — Huginn and Muninn, thought and memory). It replaces Koios, which cost ~$20/day in Anthropic tokens.
Architecture
Apple Container (Linux VM on macOS)
├── SQLite memory (local, fast, single file)
├── pi-ai for LLM calls (model-agnostic: Anthropic, OpenAI, Gemini, etc.)
├── Autonomous mode: tick loop, Telegram, consolidation
├── Interactive mode: pi CLI with extensions
└── PARA workspace (projects/areas/resources/archive)
Key decisions:
- SQLite for memory, not PDS — memory operations need microsecond latency
- pi-ai, not Claude SDK — model-agnostic, switch providers without code changes
- Apple Container — isolated Linux VM, agent can modify its own code
- PARA filesystem — real directories, not database abstractions
Self-Modification
The agent can modify its own extensions, values, prompt components, and workspace. Gated by:
- Friction — must articulate why before modifying
- Git commit — snapshot before change
- Rollback available —
git checkoutto any previous state
Open Questions
These are honest questions, not rhetorical ones:
| Question | Status |
|---|---|
| Do the survival thresholds need tuning? | Probably. 0.10 → 0.80 is theoretically clean but untested at scale. |
| Does consolidation actually produce better retrieval? | Unknown. We compress memories but haven't measured retrieval quality. |
| Is the serendipity mechanism doing anything? | It was added to fix an echo chamber problem. Did it? |
| Should salience weights be static? | Phase modes suggest weights should shift. We detect modes but don't adjust. |
| Is five levels right? | Three might be enough. Seven might be better. We picked five because it felt right. |
| Does self-modification destabilize? | Theory says yes without friction. We have friction. Is it enough? |
Acknowledgments
- Justin Garringer: The salience equation
- carlsr9001: Salience Simulation Lab (phase modes, empirical parameters)
- Lily Luo (Atlas research, January 2026): Values-in-tension framework, friction mechanisms, permission-to-fail insight
The Kabbalistic structure provides naming and conceptual organization. It is metaphor, not mathematics.
Summary
Kab is cognitive infrastructure for AI agents. The core insight: memory is a judgment problem, not a storage problem. Tiered consolidation with salience-based survival forces the system to decide what matters. Everything else — the equations, the phase modes, the gravity merging — serves that single idea.
The first version of this paper had more formalism and less honesty. The architecture had more components and less function. We built it, ran it for months, and learned what actually matters.
What matters: forgetting.
Website: kabbalah.computer Contact: @iammatthias.com on Bluesky Source: github.com/iammatthias/KOIOS