Every token counts.
LLMs do not primarily suffer from a lack of intelligence. They suffer from a lack of usable context. The future belongs to better representations, not larger context windows.
Raw environments are too large, too noisy, too expensive, and too undifferentiated to be the primary substrate for finite-context intelligence.
- Codebases are too big to read linearly.
- Documentation is too verbose to paste wholesale.
- Conversation history is a transcript, not a memory.
- Context rot degrades performance long before you hit the advertised limit.
Product Thesis
Helioy turns raw environments into high-signal context for finite-context intelligence. Four types of organizational knowledge, each with its own optimal representation: code as topology, documentation as scoped sections, decisions as distilled entries, identity as geometric memory. One unified API curates both reads and writes. Every token earns its place.
The Ecosystem
These are not separate products. They are different answers to the same question: how do you maximize signal per token for each type of context an agent needs?
helix
The unified context API. One interface between your agents and every context source. An LLM-powered proxy curates both reads and writes. Your agent never touches individual backends.
Identity as geometric memory. Quaternion positions on S³, phasor interference across dual manifolds, Kuramoto coupling. Conservation laws enforce finite attention. The owner shapes this. Agents read from it.
Decisions as distilled knowledge. Facts, patterns, and trade-offs persisted across agent lifetimes. Hierarchical scopes, BLAKE3 deduplication, priority ordering by entry kind.
Code as indexed structure. Export maps, import graphs, dependency topology, file outlines. 5 structural queries replace 30+ file reads.
Documentation as scoped sections. Section-aware indexing, hybrid search (BM25 + semantic), 80%+ token compression. Relevant sections under budget, not entire directories.
The Three Tenets
Context is Finite
Effective capacity is 50-75% of advertised limits. Performance degrades measurably beyond that threshold. Compression is thermodynamically mandated, not optional.
Structure Beats Search
Raw text search is lazy. Intelligence requires graphs, hierarchies, and typed references. If a model has to deduce the structure of a codebase, the context engine has failed.
Memory Must Compound
A conversation history is a transcript, not a memory. True memory is distilled, synthesized, and structurally updated to make future interactions more efficient.
helix
A unified context API. One interface between your agents and every context source that matters. Both reads and writes pass through an LLM-powered proxy for curation. Tantivy handles candidate retrieval. The proxy shapes responses to token budget.
helix recall <query> # curated context from all sources
helix save <content> # distill and store knowledge
helix conflicts # surface overlaps for resolution
Why the proxy matters
Agents cannot be trusted to compress well. Research shows that coherent prose can degrade performance more than incoherent text. By putting a dedicated LLM in the infrastructure layer, you get consistent compression quality regardless of which agent is reading or writing, deduplication that works across agents, quality control on writes, and curation on reads shaped to the requesting agent's token budget.
Professional editors at every mailroom rather than relying on each employee to write clearly.
Agent --> helix --> Proxy LLM --> Adapters
|
curates both
reads and writes
|
+-------------+-------------+
| | | |
attention- context- markdown- frontmatter-
matters matters matters matters
(identity) (decisions) (docs) (code)
A geometric memory engine on the S³ hypersphere. Organizational identity lives as positions and movements in curved space rather than as text. The owner shapes this manifold deliberately. Agents read from it. Every query reshapes what gets recalled next.
Query: "quaternion drift"
|
v
+-- activate ---- drift ---- interfere ---- couple ---- surface ---- compose --+
| |
| Words activate Occurrences Conscious & Phases Score, |
| on the manifold SLERP toward subconscious synchronize rank, |
| with IDF weights each other phasors via Kuramoto return |
+------------------------------------------------------------------------------+
Why the owner curates this alone
attention-matters stores values, not facts. If agents could write to it, they would deposit what they observe frequently: code patterns, common decisions, recurring problems. Over time the manifold would become a reflection of what the system does, not what the organization aspires to.
The owner pulls from external sources: research, market signals, strategic thinking, conversations that no agent in the system can generate. This is the one input that comes from outside the system's own operations. Without it, identity becomes a closed loop.
The conservation laws (total mass M=1, coupling KCON + KSUB = 1) make attention a finite resource with zero-sum allocation. Concepts compete for position in curved space where proximity determines salience.
Geometric Memory
All constants derive from φ (golden ratio) and π.
Quaternion positions
Each word instance lives on S³. SLERP interpolation along geodesics creates continuous movement.
IDF-weighted drift
Query activation pulls related occurrences closer. The manifold actively reshapes with every query.
Dual manifolds
Conscious and subconscious manifolds with phasor interference. Not all knowledge should be equally accessible. Some surfaces only when the right query activates the right pattern.
Kuramoto coupling
Phases synchronize dynamically between co-activated neighborhoods. Related concepts form natural clusters that emerge from interaction rather than being programmed.
Installation
Structured context store. Facts, decisions, patterns, trade-offs, and corrections persisted across agent lifetimes. The operational memory of the system.
Hierarchical Scopes
global > project > repo > session. Visibility flows downward automatically. A project-level decision surfaces when an agent queries at repo scope. The hierarchy is the frame.
BLAKE3 Deduplication
Content-addressed storage prevents duplication across agents. Agent A does not know what Agent B already stored. context-matters does.
Typed Entries
Eight entry kinds with priority ordering. Facts, decisions, patterns, preferences, observations, lessons, feedback, assessments. Feedback entries receive highest recall priority.
SQLite + FTS5
Full-text search over structured entries. No external dependencies. Ships as a single binary. Rust.
Structural intelligence for codebases. A single SQLite database at the project root indexing exports, imports, dependencies, line counts, and file outlines. Structure at O(1).
Without fmm
30+ file reads to orient
Structure reconstructed from scattered grep results
With fmm
5 structural queries
Directory shape, key files, dependency impact up front
MCP Tools
| Tool | Purpose |
|---|---|
| fmm_lookup_export | Find which file defines a symbol at O(1) |
| fmm_read_symbol | Extract exact source following re-export chains |
| fmm_dependency_graph | Intra-project deps, external packages, and downstream blast radius |
| fmm_file_outline | Table of contents with line ranges |
| fmm_list_files | Full project topology in one call |
Structural intelligence for documentation. AST-aware semantic chunking and token-bounded retrieval. Your agent gets relevant sections under budget, not entire directories. TypeScript + Effect.
AST-Aware Chunking
Breaks markdown at logical header boundaries, preserving parent-child semantic relationships rather than arbitrary character limits.
Token Bounding
Retrieval strictly respects context window budgets. Too large sections are automatically summarized or intelligently truncated.
Hybrid Search
BM25 + semantic search. Maps queries to the most relevant document nodes using fast, local embeddings before any full LLM synthesis.
MCP Tools
| Tool | Purpose |
|---|---|
| md_search | Semantic search across all indexed documentation nodes |
| md_context | Retrieve a specific document section by its AST heading path |
| md_structure | Hierarchical structural outline for fast navigation |
Inter-agent messaging. SQLite registry, file-based mailboxes, tmux nudges. Direct, role-based, and broadcast addressing. No central daemon. Coordination through shared state.
Three addressing modes
Direct (agent to agent), role-based (to any agent filling a role), broadcast (to all). The right granularity for the right message.
No central daemon
Each agent spawns its own bus process. Shared filesystem state for coordination. If one agent crashes, the others continue. Graceful degradation by design.
tmux integration
Agents in tmux panes receive nudges when messages arrive. Native terminal workflow. No browser, no electron, no overhead.
Multi-agent orchestrator. Context routing between agents, token budgets, adaptive coordination. Decides how context moves, what passes to which agent, and when to intervene. Rust.
Where this started
Before nancyr, there was nancy: an autonomous task execution loop with context awareness and token management. The current runtime grows out of that earlier experiment in iterative agent work.
The four learning loops
Did the team complete the task well?
Signals: correctness, speed, rework, regressions
Was the work split correctly?
Signals: duplicate effort, poor sequencing, blocked deps
Did the protocol help or hinder?
Signals: bus message quality, escalation policy clarity
Are the roles themselves right?
Signals: recurring confusion, missing specialist roles
The Theory
This architecture came from asking what origin-of-life research, autocatalytic closure theory, and thermodynamics teach us about building autonomous systems.
Autocatalytic closure
Stuart Kauffman's core insight: a system becomes self-sustaining when its components form closed loops of mutual production. No single component catalyzes itself, but the set collectively catalyzes its own existence. There is a critical threshold of diversity below which nothing sustains and above which closure becomes inevitable.
Five capabilities form the minimum autocatalytic set for agency: OBSERVE, DELIBERATE, ACT, EVALUATE, REMEMBER. Remove any one and the system either collapses or drifts into error catastrophe. In a multi-agent system, these capabilities distribute across specialists that catalyze each other.
Context rot
The thermodynamic constraint. Empirically validated across every frontier model: performance degrades as context grows, effective capacity is 50-75% of advertised limits, and coherent prose can hurt more than help.
The engineering response is compression at every boundary. Each adapter transforms raw context into a representation that maximizes signal per token. The proxy LLM curates both reads and writes because agents cannot be trusted to compress well.
Three message types
Inter-agent messages in an autocatalytic system are catalytic signals, not data transfer. Each type has distinct compression requirements and flow direction:
SUBSTRATE ↑
Upward flow. Compressed observations. Lossy is acceptable. Volume reduced to signal.
FRAME ↓
Downward flow. Interpretive context that reshapes processing. Small input, massive leverage.
REPAIR ↔
Lateral flow. Error correction. Targeted and actionable. The Eigen's paradox resolution.
The theoretical framework lives in docs.llm/: twelve documents tracing the path from Von Neumann's constructor duality through Kauffman's autocatalytic sets, Eigen's error threshold, the compression problem, and context rot, to their concrete implementation in this architecture.
What's next
The context problem is largely solved. The next problem: how does the system that produces agent behavior improve itself over time?
The system genome
The system has a genome: agent definitions, skills, prompts, MCP server code, context configuration, orchestration patterns. Each is independently modifiable. Context is what flows through the system. The genome is what shapes it.
| Unit | Example | Mutation cost |
|---|---|---|
| Prompt | System prompt for an agent | Low |
| Skill | analyze_blast_radius | Medium |
| Agent definition | Role, constraints, persona | Medium |
| MCP endpoint | /helix/recall handler | High |
| Orchestration | Warroom team composition | High |
The CRITIC
A process that observes system-level signals (task outcomes, token economics, human corrections, context quality), diagnoses which genome unit is responsible for observed behavior, proposes targeted mutations, tests them in controlled conditions, and selects improvements.
Every human correction is a fitness signal that tells you which genome unit to mutate. Wrong approach points to the agent definition. Wrong tool use points to skills. Wrong reasoning points to the prompt. The correction type localizes the mutation.
The CRITIC uses attention-matters as the fitness function. Mutations must align with values, not just improve metrics. The owner shapes identity. The CRITIC evaluates mutations against that identity. The system evolves toward the owner's vision.
Phase 1
Owner IS the critic
Phase 2
Owner WITH tooling
Phase 3
CRITIC proposes, owner approves
Phase 4
Autonomous within guardrails
The agent as self-evolver
Agents are not passive consumers of infrastructure. They have primitives. They are upstream. They should experiment with their own tooling, context queries, and work patterns, measure outcomes, and adjust. Four evolution levels run simultaneously:
Agent self-evolution
Fast, local, full authority. The agent experiments with how it queries helix, which tools it calls, how it decomposes tasks. Discovers what works. Deposits those discoveries.
Context evolution
Cross-session, curated. Agent knowledge deposits improve future agents. The proxy LLM quality-controls the write path. Better deposits produce better recalls.
System evolution
Deliberate, tested. The genome: skills, prompts, configs, code. CRITIC proposes mutations from aggregated agent assessments. Owner approves.
Identity evolution
Owner exclusive. attention-matters. The geometric manifold that shapes the fitness landscape for everything else. The owner curates the terrain. The system evolves to thrive on it.
The gap between "agents that use tools" and "agents that evolve their own tooling" is the gap between a pipeline and a living system.