Process documents 100x larger than your LLM's context windowβwithout vector databases or chunking heuristics.
LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.
Based on the Recursive Language Models paper.
Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleusβa constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Query ββββββΆβ LLM Reasons ββββββΆβ Nucleus Command β
β "total sales?" β β about intent β β (sum RESULTS) β
βββββββββββββββββββ βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ ββββββββββΌβββββββββ
β Final Answer βββββββ Lattice Engine βββββββ Parser β
β 13,000,000 β β Executes β β Validates β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Why this works better than code generation:
- Reduced entropy - Nucleus has a rigid grammar with fewer valid outputs than JavaScript
- Fail-fast validation - Parser rejects malformed commands before execution
- Safe execution - Lattice only executes known operations, no arbitrary code
- Small model friendly - 7B models handle symbolic grammars better than freeform code
The LLM outputs commands in the Nucleus DSLβan S-expression language designed for document analysis:
; Search for patterns
(grep "ERROR")
; Filter results
(filter RESULTS (lambda x (match x "timeout" 0)))
; Aggregate
(sum RESULTS) ; Auto-extracts numbers from lines
(count RESULTS) ; Count matching items
; Final answer
<<<FINAL>>>13000000<<<END>>>The Lattice engine (src/logic/) processes Nucleus commands:
- Parser (
lc-parser.ts) - Parses S-expressions into an AST - Type Inference (
type-inference.ts) - Validates types before execution - Constraint Resolver (
constraint-resolver.ts) - Handles symbolic constraints like[Ξ£β‘ΞΌ] - Solver (
lc-solver.ts) - Executes commands against the document
Lattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.
For large result sets, RLM uses a handle-based architecture with in-memory SQLite (src/persistence/) that achieves 97%+ token savings:
Traditional: LLM sees full array [15,000 tokens for 1000 results]
Handle-based: LLM sees stub [50 tokens: "$res1: Array(1000) [preview...]"]
How it works:
- Results are stored in SQLite with FTS5 full-text indexing
- LLM receives only handle references (
$res1,$res2, etc.) - Operations execute server-side, returning new handles
- Full data is only materialized when needed
Components:
SessionDB- In-memory SQLite with FTS5 for fast full-text searchHandleRegistry- Stores arrays, returns compact handle referencesHandleOps- Server-side filter/map/count/sum on handlesFTS5Search- Phrase queries, boolean operators, relevance rankingCheckpointManager- Save/restore session state
The Lattice engine doubles as a context memory for LLM agents. Instead of roundtripping large text blobs in every message, agents stash context server-side and carry only compact handle stubs:
Agent reads file, summarizes β lattice_memo "auth architecture"
β $memo1: "auth architecture" (2.1KB, 50 lines)
20 messages later, needs it β lattice_expand $memo1
β Full 50-line summary
Token math (30-message session, 3 source files stashed):
- Traditional roundtripping: 836K tokens
- Memo-based (stubs + 6 expands): 57K tokens β 93% savings
Memos persist across document loads (lattice_load clears query handles but keeps memos), support LRU eviction (100 memo cap, 10MB budget), and can be explicitly deleted when stale. No document needs to be loaded to use memos.
The LLM does reasoning, not code generation:
- Understands intent - Interprets "total of north sales" as needing grep + filter + sum
- Chooses operations - Decides which Nucleus commands achieve the goal
- Verifies results - Checks if the current results answer the query
- Iterates - Refines search if results are too broad or narrow
The LLM never writes JavaScript. It outputs Nucleus commands that Lattice executes safely.
| Component | Purpose |
|---|---|
| Nucleus Adapter | Prompts LLM to output Nucleus commands |
| Lattice Parser | Parses S-expressions to AST |
| Lattice Solver | Executes commands against document |
| In-Memory Handles | Handle-based storage with FTS5 (97% token savings) |
| Memory Pad | Memo handles for stashing context across turns (93% savings) |
| BM25 + Semantic | Ranked keyword and TF-IDF cosine similarity search |
| RRF Fusion | Reciprocal Rank Fusion for multi-signal search |
| Dampening | Gravity dampening to remove false positives |
| Q-Value Reranker | Learns which lines are useful across turns |
| miniKanren | Relational engine for classification |
| RAG Hints | Few-shot examples from past successes |
Install from npm:
npm install -g matryoshka-rlmOr run without installing:
npx matryoshka-rlm "How many ERROR entries are there?" ./server.logThe package provides several CLI tools:
| Command | Description |
|---|---|
rlm |
Main CLI for document analysis with LLM reasoning |
rlm-mcp |
MCP server with full RLM + LLM orchestration (analyze_document tool) |
lattice-mcp |
MCP server exposing direct Nucleus commands (no LLM required) |
lattice-repl |
Interactive REPL for Nucleus commands |
lattice-http |
HTTP server for Nucleus queries |
lattice-pipe |
Pipe adapter for programmatic access |
lattice-setup |
Setup script for Claude Code integration |
git clone https://github.com/yogthos/Matryoshka.git
cd Matryoshka
npm install
npm run buildCopy config.example.json to config.json and configure your LLM provider:
{
"llm": {
"provider": "ollama"
},
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"model": "qwen3-coder:30b",
"options": { "temperature": 0.2, "num_ctx": 8192 }
},
"deepseek": {
"baseUrl": "https://api.deepseek.com",
"apiKey": "${DEEPSEEK_API_KEY}",
"model": "deepseek-chat",
"options": { "temperature": 0.2 }
}
}
}# Basic usage
rlm "How many ERROR entries are there?" ./server.log
# With options
rlm "Count all ERROR entries" ./server.log --max-turns 15 --verbose
# See all options
rlm --helpRLM includes lattice-mcp, an MCP (Model Context Protocol) server for direct access to the Nucleus engine. This allows coding agents to analyze documents with 80%+ token savings compared to reading files directly.
The key advantage is handle-based results: query results are stored server-side in SQLite, and the agent receives compact stubs like $res1: Array(1000) [preview...] instead of full data. Operations chain server-side without roundtripping data.
| Tool | Description |
|---|---|
lattice_load |
Load a document for analysis |
lattice_query |
Execute Nucleus commands on the loaded document |
lattice_expand |
Expand a handle to see full data (with optional limit/offset) |
lattice_memo |
Store arbitrary context as a memo handle (no document required) |
lattice_memo_delete |
Delete a stale memo to free memory |
lattice_close |
Close the session and free memory |
lattice_status |
Get session status, document info, and memo usage |
lattice_bindings |
Show current variable bindings and memo labels |
lattice_reset |
Reset all bindings and memos but keep document loaded |
lattice_help |
Get Nucleus command reference |
{
"mcp": {
"lattice": {
"type": "stdio",
"command": "lattice-mcp"
}
}
}1. lattice_load("/path/to/large-file.txt") # Load document (use for >500 lines)
2. lattice_query('(grep "ERROR")') # Search - returns handle stub $res1
3. lattice_query('(filter RESULTS ...)') # Narrow down - returns handle stub $res2
4. lattice_query('(count RESULTS)') # Get count without seeing data
5. lattice_expand("$res2", limit=10) # Expand only what you need to see
6. lattice_close() # Free memory when done
Token efficiency tips:
- Query results return handle stubs, not full data
- Use
lattice_expandwithlimitto see only what you need - Chain
grep β filter β count/sumto refine progressively - Use
RESULTSin queries (always points to last result) - Use
$res1,$res2etc. withlattice_expandto inspect specific results
1. lattice_memo(content="<file summary>", label="auth module") β $memo1 stub
2. lattice_memo(content="<analysis>", label="perf bottlenecks") β $memo2 stub
3. # ... many turns later, need the auth context ...
4. lattice_expand("$memo1") β Full summary
5. lattice_memo_delete("$memo1") β Drop when stale
Memos don't require a loaded document β they create a session automatically. Limits: 100 memos, 10MB total. Oldest evicted when exceeded.
import { runRLM } from "matryoshka-rlm/rlm";
import { createLLMClient } from "matryoshka-rlm";
const llmClient = createLLMClient("ollama", {
baseUrl: "http://localhost:11434",
model: "qwen3-coder:30b",
options: { temperature: 0.2 }
});
const result = await runRLM("How many ERROR entries are there?", "./server.log", {
llmClient,
maxTurns: 10,
turnTimeoutMs: 30000,
});$ rlm "How many ERROR entries are there?" ./server.log --verbose
ββββββββββββββββββββββββββββββββββββββββββββββββββ
[Turn 1/10] Querying LLM...
[Turn 1] Term: (grep "ERROR")
[Turn 1] Result: 42 matches
ββββββββββββββββββββββββββββββββββββββββββββββββββ
[Turn 2/10] Querying LLM...
[Turn 2] Term: (count RESULTS)
[Turn 2] Console output:
[Lattice] Counting 42 items
[Turn 2] Result: 42
ββββββββββββββββββββββββββββββββββββββββββββββββββ
[Turn 3/10] Querying LLM...
[Turn 3] Final answer received
42
The model:
- Searched for relevant data with grep
- Summed the matching results
- Output the final answer
(grep "pattern") ; Regex search, returns matches with line numbers
(fuzzy_search "query" 10) ; Fuzzy search, returns top N matches with scores
(bm25 "query terms" 10) ; BM25 ranked keyword search (TF-IDF scoring)
(semantic "query terms" 10) ; TF-IDF cosine similarity search
(text_stats) ; Document metadata (length, line count, samples)Combine results from multiple search operations for better relevance:
;; Reciprocal Rank Fusion β merge results from different search signals
(fuse (grep "ERROR") (bm25 "error handling") (semantic "failure"))
;; Gravity dampening β halve scores for false positives lacking query term overlap
(dampen (bm25 "database error") "database error")
;; Q-value reranking β learns which lines are useful across turns
(rerank (fuse (grep "ERROR") (bm25 "error")))
;; Full pipeline: fuse β dampen β rerank
(rerank (dampen (fuse (grep "ERROR") (bm25 "error") (semantic "failure")) "error"))For code files, Lattice uses tree-sitter to extract structural symbols. This enables code-aware queries that understand functions, classes, methods, and other language constructs.
Built-in languages (packages included):
- TypeScript (.ts, .tsx), JavaScript (.js, .jsx), Python (.py), Go (.go)
- HTML (.html), CSS (.css), JSON (.json)
Additional languages (install package to enable):
- Rust, C, C++, Java, Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Bash, SQL, and more
(list_symbols) ; List all symbols (functions, classes, methods, etc.)
(list_symbols "function") ; Filter by kind: "function", "class", "method", "interface", "type", "struct"
(get_symbol_body "myFunc") ; Get source code body for a symbol by name
(get_symbol_body RESULTS) ; Get body for symbol from previous query result
(find_references "myFunc") ; Find all references to an identifierExample workflow for code analysis:
1. lattice_load("./src/app.ts") # Load a code file
2. lattice_query('(list_symbols)') # Get all symbols β $res1
3. lattice_query('(list_symbols "function")') # Just functions β $res2
4. lattice_expand("$res2", limit=5) # See function names and line numbers
5. lattice_query('(get_symbol_body "handleRequest")') # Get function body
6. lattice_query('(find_references "handleRequest")') # Find all usages
Symbols include metadata like name, kind, start/end lines, and parent relationships (e.g., methods within classes).
When a code file is loaded, Lattice automatically builds an in-memory knowledge graph that tracks call relationships, inheritance, and interface implementations. This enables structural queries beyond simple text search.
(callers "funcName") ; Who calls this function?
(callees "funcName") ; What does this function call?
(ancestors "ClassName") ; Inheritance chain (extends)
(descendants "ClassName") ; All subclasses (transitive)
(implementations "IFace") ; Classes implementing this interface
(dependents "name") ; All transitive dependents
(dependents "name" 2) ; Dependents within depth limit
(symbol_graph "name" 1) ; Neighborhood subgraph around symbolExample workflow for call graph analysis:
1. lattice_load("./src/service.ts")
2. lattice_query('(callers "handleRequest")') # Who calls it? β $res1
3. lattice_query('(callees "handleRequest")') # What does it call? β $res2
4. lattice_query('(ancestors "MyService")') # Inheritance chain β $res3
5. lattice_query('(symbol_graph "handleRequest" 2)') # 2-hop neighborhood
The graph is built using line-based heuristics (word-boundary matching for calls, syntax pattern matching for extends/implements), so it produces approximate but useful results without requiring a full language server.
Matryoshka includes built-in symbol mappings for 20+ languages. To enable a language, install its tree-sitter grammar package:
# Enable Rust support
npm install tree-sitter-rust
# Enable Java support
npm install tree-sitter-java
# Enable Ruby support
npm install tree-sitter-rubyLanguages with built-in mappings:
- TypeScript, JavaScript, Python, Go, Rust, C, C++, Java
- Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Elixir
- HTML, CSS, JSON, YAML, TOML, Markdown, SQL, Bash
Once a package is installed, the language is automatically available for symbol extraction.
For languages without built-in mappings, or to override existing mappings, create a config file at ~/.matryoshka/config.json:
{
"grammars": {
"mylang": {
"package": "tree-sitter-mylang",
"extensions": [".ml", ".mli"],
"moduleExport": "mylang",
"symbols": {
"function_definition": "function",
"method_definition": "method",
"class_definition": "class",
"module_definition": "module"
}
}
}
}Configuration fields:
| Field | Required | Description |
|---|---|---|
package |
Yes | npm package name for the tree-sitter grammar |
extensions |
Yes | File extensions to associate with this language |
symbols |
Yes | Maps tree-sitter node types to symbol kinds |
moduleExport |
No | Submodule export name (e.g., "typescript" for tree-sitter-typescript) |
Symbol kinds: function, method, class, interface, type, struct, enum, trait, module, variable, constant, property
To configure symbol mappings for a new language, you need to know the tree-sitter node types. You can explore them using the tree-sitter CLI:
# Install tree-sitter CLI
npm install -g tree-sitter-cli
# Parse a sample file and see the AST
tree-sitter parse sample.mylangOr use the tree-sitter playground to explore node types interactively.
Example: Adding OCaml support
- Find the grammar package:
tree-sitter-ocaml - Install it:
npm install tree-sitter-ocaml - Explore the AST to find node types for functions, modules, etc.
- Add to
~/.matryoshka/config.json:
{
"grammars": {
"ocaml": {
"package": "tree-sitter-ocaml",
"extensions": [".ml", ".mli"],
"moduleExport": "ocaml",
"symbols": {
"value_definition": "function",
"let_binding": "variable",
"type_definition": "type",
"module_definition": "module",
"module_type_definition": "interface"
}
}
}
}Note: Some tree-sitter packages use native Node.js bindings that may not compile on all systems. If installation fails, check if the package supports your Node.js version or look for WASM alternatives.
(filter RESULTS (lambda x (match x "pattern" 0))) ; Filter by regex
(map RESULTS (lambda x (match x "(\\d+)" 1))) ; Extract from each
(sum RESULTS) ; Sum numbers in results
(count RESULTS) ; Count items(match str "pattern" 0) ; Regex match, return group N
(replace str "from" "to") ; String replacement
(split str "," 0) ; Split and get index
(parseInt str) ; Parse integer
(parseFloat str) ; Parse floatWhen the model sees data that needs parsing, it can use declarative type coercion:
; Date parsing (returns ISO format YYYY-MM-DD)
(parseDate "Jan 15, 2024") ; -> "2024-01-15"
(parseDate "01/15/2024" "US") ; -> "2024-01-15" (MM/DD/YYYY)
(parseDate "15/01/2024" "EU") ; -> "2024-01-15" (DD/MM/YYYY)
; Currency parsing (handles $, β¬, commas, etc.)
(parseCurrency "$1,234.56") ; -> 1234.56
(parseCurrency "β¬1.234,56") ; -> 1234.56 (EU format)
; Number parsing
(parseNumber "1,234,567") ; -> 1234567
(parseNumber "50%") ; -> 0.5
; General coercion
(coerce value "date") ; Coerce to date
(coerce value "currency") ; Coerce to currency
(coerce value "number") ; Coerce to number
; Extract and coerce in one step
(extract str "\\$[\\d,]+" 0 "currency") ; Extract and parse as currencyUse in map for batch transformations:
; Parse all dates in results
(map RESULTS (lambda x (parseDate (match x "[A-Za-z]+ \\d+, \\d+" 0))))
; Extract and sum currencies
(map RESULTS (lambda x (parseCurrency (match x "\\$[\\d,]+" 0))))For complex transformations, the model can synthesize functions from examples:
; Synthesize from input/output pairs
(synthesize
("$100" 100)
("$1,234" 1234)
("$50,000" 50000))
; -> Returns a function that extracts numbers from currency stringsThis uses Barliman-style relational synthesis with miniKanren to automatically build extraction functions.
Results from previous turns are available:
RESULTS- Latest array result (updated by grep, filter)_0,_1,_2, ... - Results from specific turns
<<<FINAL>>>your answer here<<<END>>>Symptom: The model provides an answer immediately with hallucinated data.
Solutions:
- Use a more capable model (7B+ recommended)
- Be specific in your query: "Find lines containing ERROR and count them"
Symptom: "Max turns (N) reached without final answer"
Solutions:
- Increase
--max-turnsfor complex documents - Check
--verboseoutput for repeated patterns (model stuck in loop) - Simplify the query
Symptom: "Parse error: no valid command"
Cause: Model output malformed S-expression.
Solutions:
- The system auto-converts JSON to S-expressions as fallback
- Use
--verboseto see what the model is generating - Try a different model tuned for code/symbolic output
npm test # Run tests
npm test -- --coverage # With coverage
RUN_E2E=1 npm test -- tests/e2e.test.ts # E2E tests (requires Ollama)
npm run build # Build
npm run typecheck # Type checksrc/
βββ adapters/ # Model-specific prompting
β βββ nucleus.ts # Nucleus DSL adapter
β βββ types.ts # Adapter interface
βββ logic/ # Lattice engine
β βββ lc-parser.ts # Nucleus parser
β βββ lc-solver.ts # Command executor (uses miniKanren)
β βββ type-inference.ts
β βββ constraint-resolver.ts
β βββ bm25.ts # BM25 keyword search (from Ori-Mnemos)
β βββ semantic.ts # TF-IDF cosine similarity search
β βββ rrf.ts # Reciprocal Rank Fusion (from Ori-Mnemos)
β βββ dampening.ts # Gravity dampening (from Ori-Mnemos)
β βββ qvalue.ts # Q-value learning reranker (from Ori-Mnemos)
β βββ stopwords.ts # Shared stopword set
βββ persistence/ # In-memory handle storage (97% token savings)
β βββ session-db.ts # In-memory SQLite with FTS5
β βββ handle-registry.ts # Handle creation and stubs
β βββ handle-ops.ts # Server-side operations
β βββ fts5-search.ts # Full-text search
β βββ checkpoint.ts # Session persistence
βββ treesitter/ # Code-aware symbol extraction
β βββ parser-registry.ts # Tree-sitter parser management
β βββ symbol-extractor.ts # AST β symbol extraction
β βββ language-map.ts # Extension β language mapping
β βββ types.ts # Symbol interfaces
βββ engine/ # Nucleus execution engine
β βββ nucleus-engine.ts
β βββ handle-session.ts # Session with symbol support
βββ minikanren/ # Relational programming engine
βββ synthesis/ # Program synthesis (Barliman-style)
β βββ evalo/ # Extractor DSL
βββ rag/ # Few-shot hint retrieval
βββ rlm.ts # Main execution loop
This project incorporates ideas and code from:
- Ori-Mnemos - A persistent memory infrastructure for AI agents implementing the Recursive Memory Harness framework. BM25 search, Reciprocal Rank Fusion, gravity dampening, and Q-value learning reranking were ported from Ori-Mnemos and adapted for line-based document analysis.
- Nucleus - A symbolic S-expression language by Michael Whitford. RLM uses Nucleus syntax for the constrained DSL that the LLM outputs, providing a rigid grammar that reduces model errors.
- ramo - A miniKanren implementation in TypeScript by Will Lewis. Used for constraint-based program synthesis.
- Barliman - A prototype smart editor by William Byrd and Greg Rosenblatt that uses program synthesis to assist programmers. The Barliman-style approach of providing input/output constraints instead of code inspired the synthesis workflow.
- tree-sitter - A parser generator tool and incremental parsing library. Used for extracting structural symbols (functions, classes, methods) from code files to enable code-aware queries.
MIT
