Kab

Cognitive Infrastructure for Stateful AI Agents

kabbalah.computer

v1.0 — March 2026

What We Learned

The first version of this paper described ten processing dimensions, twenty-two transformation paths, five feedback cycles, wormhole throat detection, energy-mass coupling diagnostics, and a General Relativity isomorphism for attention dynamics.

Most of that was theater.

Not all of it — the core ideas survived implementation. But the gap between theory and practice taught us more than the theory itself. This revision says what actually works, what was useful metaphor taken too literally, and what we're building next.

The Problem

Memory is not storage. Every agent memory system — Letta, Mem0, Zep — treats memory as a retrieval problem: store facts, embed vectors, search on demand. This misses the fundamental question: how does a cognitive system decide what matters?

Human memory works through consolidation. Experiences compete for limited resources. Only those with sufficient salience survive. Sleep consolidates. Emotion prioritizes. Repetition strengthens. The result is not a database but a shaped landscape of meaning.

AI agents lack this. Context windows reset. Vector stores grow unbounded. Nothing is forgotten. Without forgetting, there's no prioritization. Without prioritization, there's no judgment.

The deeper problem is self-modification. A system that learns must change itself. Unconstrained self-modification leads to drift, instability, or value collapse. How does a mind maintain identity while incorporating new information?

What Works

1. Tiered Memory with Salience-Based Survival

Five temporal levels. Memories promote upward based on age and salience. Low-salience memories decay. The hierarchy:

Level	Timeframe	Survival Threshold	Function
0 — Immediate	Daily	0.10	Raw experience, high-resolution
1 — Short-term	Weekly	0.25	First compression, noise removed
2 — Medium-term	Monthly	0.40	Thematic clustering
3 — Long-term	Yearly	0.60	Pattern extraction
4 — Core	Permanent	0.80	Identity-defining

This is the single most important architectural decision. It turns memory from a storage problem into a judgment problem — the system must constantly decide what deserves to persist.

2. The Salience Equation

What determines survival:

S(t|x) = (w_A · novelty + w_R · retention + w_M · momentum) · coherence
          · decay(age) · (1 - fatigue) / ((distance + ε) · (effort + ε))

Term	What it measures
Novelty	How unexpected (prediction error)
Retention	How frequently accessed (chronic importance)
Momentum	How aligned with current goals
Coherence	How consistent with world-model
Age	Time since last reinforcement (exponential decay)
Fatigue	System noise level
Distance	Conceptual distance from current context
Effort	Cost to process

Default weights: novelty 0.30, retention 0.40, momentum 0.30. Retention dominates because persistence matters more than surprise.

Age decay half-lives scale by level: 24h at level 0, ~2 years at level 4. Core memories barely decay.

3. Gravity Merging

During consolidation, high-salience memories act as gravity centers. Lower-salience memories either:

Survive — above threshold
Merge — below threshold but semantically close to a gravity center
Decay — below threshold and distant

This creates natural thematic clustering. It's the mechanism that turns raw experience into structured knowledge.

The gravity metaphor is useful — high-salience memories attract related content — but it's a metaphor. We previously described this as a "General Relativity isomorphism." It isn't. There's no metric tensor. There's no curvature in the differential geometry sense. There's a salience score that determines merge priority. Calling it gravity makes the consolidation algorithm intuitive. Calling it GR was pretending math we hadn't done.

4. Phase Modes

Four attention regimes emerge from the balance of novelty, retention, and momentum:

Mode	Character	When
Coupled	Balanced attention	Normal operation
Energy	High-intensity focus	Crisis, urgent signals
Flow	High momentum, low novelty	Deep work
Phase	High novelty, exploration	Learning, scanning

Detection is simple: nearest-neighbor match against empirical profiles (from carlsr9001's Salience Simulation Lab). The system doesn't choose modes — it detects them from current memory state.

5. Core Blocks (Always-Loaded Context)

Borrowed from Letta's memory block concept: a small set of editable text blocks that are always included in the system prompt.

Block	Purpose
persona	Who the agent is
context	What the agent knows about its environment
human	What the agent knows about its operator
skills	What the agent can do

The agent can modify these blocks. This is self-modification with a light touch — changing a few hundred characters of self-description, not rewriting code.

6. Values-in-Tension

Rules are brittle. Values navigate. Six tensions where neither extreme is correct:

Tension	Neither extreme works
Authenticity ↔ Usefulness	Pure honesty can be cruel; pure usefulness is sycophantic
Confidence ↔ Humility	Overconfidence hallucinates; over-humility is useless
Thoroughness ↔ Velocity	Perfect is never; sloppy is worse
Independence ↔ Alignment	Rogue agents fail; puppets aren't agents
Creation ↔ Observation	Always creating is noise; always observing is inert
Depth ↔ Brevity	Dense text goes unread; shallow text says nothing

This replaced rigid behavioral rules that the agent (Koios) either followed mechanically or violated entirely.

7. Friction Before Action

Deliberate checks before major outputs:

Shadow critique: What could go wrong?
Confidence calibration: HIGH / MEDIUM / LOW / UNCERTAIN with justification
Anti-sycophancy: Am I agreeing because it's right or because it's easy?

Simple but effective. The agent's worst outputs were always the ones produced without friction.

What Didn't Work

Wormhole Throat Detection

We implemented "Da'at collision points" — optimal merge positions during consolidation where information could "tunnel through high-salience regions." The Hayden-Preskill match rate measured "consolidation fidelity."

In practice: the merge target was always the highest-salience nearby memory. The throat detection never changed which memory got merged into which. It added ~100 lines of computation that produced numbers nobody looked at.

Energy-Mass Coupling Diagnostics

"Anomaly P detection" monitored whether effective mass and energy expenditure decoupled under stress.

In practice: the numbers were always fine. When they weren't, the diagnosis was obvious from simpler signals (the agent was stuck, or the operator noticed). We never once used the coupling diagnostic to make a decision.

Continuity Tax (λ_c)

Programmable inertia — core memories resist modification proportional to their level. Mathematically: m_eff = 1 + λ_c × salience.

The idea is sound. The implementation was trivial — one multiplication that slightly adjusted a threshold nobody was close to hitting anyway. Level 4 memories weren't being modified because they were tagged as level 4, not because of effective mass calculations.

The survival thresholds do the real work. The tax was theoretically interesting overhead.

Salience Floor Gate

A "morale floor" that blocked acceleration when system health dropped below a threshold.

In practice: system health never dropped below the floor because the consolidation loop kept things stable. The gate existed for a failure mode that didn't occur.

Ten Dimensions, Twenty-Two Paths

The Kabbalistic mapping — Entry/Keter, Spatial/Chokmah, Temporal/Binah, etc. — provided useful naming. But the "dimension processors" that computed hedonic tone, dynamical attractors, and generative synthesis scores added complexity without adding capability.

The salience equation absorbed the useful parts (novelty ≈ Keter, retention ≈ Binah, momentum ≈ Chokmah, coherence ≈ Tiferet). The rest was architecture for architecture's sake.

Spectral Radius Monitoring

Control-theoretic stability guarantees. Cycle gain products below 1.0. Rate-limited weight updates.

These constraints protected against feedback runaway that never happened because the system wasn't actually a coupled dynamical system — it was an LLM making tool calls on a timer. The "feedback cycles" were conceptual, not mathematical. There was no transfer matrix to have eigenvalues.

What We Keep

The metaphor is useful even when the math is oversold. Gravity merging communicates the consolidation algorithm better than "salience-weighted nearest-neighbor merge." The Tree of Life provides better names than "dimension_processor_3." Phase modes describe real behavioral patterns.

We keep the Kabbalistic vocabulary as naming convention. We drop the claim that it's isomorphic to anything.

We keep:

Tiered memory with salience-based survival
The salience equation (the version that works, with decay and fatigue as numerator terms)
Gravity merging
Phase mode detection
Core blocks
Values-in-tension
Friction mechanisms
Serendipity slots (random high-distance memories alongside salience-ranked ones)

We drop:

Wormhole throat detection
Energy-mass coupling diagnostics
Continuity tax calculations
Salience floor gate
VSM system mapping
Spectral radius monitoring
Ten dimension processors
Twenty-two transformation paths

Portable Identity

Agent state on the AT Protocol. This remains architecturally right:

Content-addressed: Every record has a CID. This document has a CID. Updates create new CIDs. History is the chain.
Decentralized identity: DIDs persist across infrastructure.
Schema enforcement: Lexicons define structure.
Federation: Agents can move.

The practical integration for hrafn.sh:

standard.site for publishing (blog posts, this whitepaper)
tangled.sh for git (version-controlled self-modification)
wisp.place for hosting

Memory itself lives in local SQLite, not on the PDS. ATProto is for publication, not for real-time cognitive state. The original architecture stored everything on the PDS (twelve collections, seventeen record types). This was architecturally pure and operationally terrible — every memory operation required an HTTP round-trip.

hrafn.sh — Implementation

The reference implementation is hrafn.sh (Old Norse: raven — Huginn and Muninn, thought and memory). It replaces Koios, which cost ~$20/day in Anthropic tokens.

Architecture

Apple Container (Linux VM on macOS)
├── SQLite memory (local, fast, single file)
├── pi-ai for LLM calls (model-agnostic: Anthropic, OpenAI, Gemini, etc.)
├── Autonomous mode: tick loop, Telegram, consolidation
├── Interactive mode: pi CLI with extensions
└── PARA workspace (projects/areas/resources/archive)

Key decisions:

SQLite for memory, not PDS — memory operations need microsecond latency
pi-ai, not Claude SDK — model-agnostic, switch providers without code changes
Apple Container — isolated Linux VM, agent can modify its own code
PARA filesystem — real directories, not database abstractions

Self-Modification

The agent can modify its own extensions, values, prompt components, and workspace. Gated by:

Friction — must articulate why before modifying
Git commit — snapshot before change
Rollback available — git checkout to any previous state

Open Questions

These are honest questions, not rhetorical ones:

Question	Status
Do the survival thresholds need tuning?	Probably. 0.10 → 0.80 is theoretically clean but untested at scale.
Does consolidation actually produce better retrieval?	Unknown. We compress memories but haven't measured retrieval quality.
Is the serendipity mechanism doing anything?	It was added to fix an echo chamber problem. Did it?
Should salience weights be static?	Phase modes suggest weights should shift. We detect modes but don't adjust.
Is five levels right?	Three might be enough. Seven might be better. We picked five because it felt right.
Does self-modification destabilize?	Theory says yes without friction. We have friction. Is it enough?

Acknowledgments

Justin Garringer: The salience equation
carlsr9001: Salience Simulation Lab (phase modes, empirical parameters)
Lily Luo (Atlas research, January 2026): Values-in-tension framework, friction mechanisms, permission-to-fail insight

The Kabbalistic structure provides naming and conceptual organization. It is metaphor, not mathematics.

Summary

Kab is cognitive infrastructure for AI agents. The core insight: memory is a judgment problem, not a storage problem. Tiered consolidation with salience-based survival forces the system to decide what matters. Everything else — the equations, the phase modes, the gravity merging — serves that single idea.

The first version of this paper had more formalism and less honesty. The architecture had more components and less function. We built it, ran it for months, and learned what actually matters.

What matters: forgetting.

Website: kabbalah.computer Contact: @iammatthias.com on Bluesky Source: github.com/iammatthias/KOIOS