freshcrate
Skin:/
Home > MCP Servers > Matryoshka

Matryoshka

MCP server for token-efficient large document analysis via the use of REPL state

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

MCP server for token-efficient large document analysis via the use of REPL state

README

Matryoshka

Tests

Process documents 100x larger than your LLM's context windowโ€”without vector databases or chunking heuristics.

The Problem

LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.

Based on the Recursive Language Models paper.

How It Works

Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleusโ€”a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User Query    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   LLM Reasons   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Nucleus Command โ”‚
โ”‚ "total sales?"  โ”‚     โ”‚  about intent   โ”‚     โ”‚  (sum RESULTS)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                         โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Final Answer   โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ Lattice Engine  โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚     Parser      โ”‚
โ”‚   13,000,000    โ”‚     โ”‚    Executes     โ”‚     โ”‚    Validates    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why this works better than code generation:

  1. Reduced entropy - Nucleus has a rigid grammar with fewer valid outputs than JavaScript
  2. Fail-fast validation - Parser rejects malformed commands before execution
  3. Safe execution - Lattice only executes known operations, no arbitrary code
  4. Small model friendly - 7B models handle symbolic grammars better than freeform code

Architecture

The Nucleus DSL

The LLM outputs commands in the Nucleus DSLโ€”an S-expression language designed for document analysis:

; Search for patterns
(grep "ERROR")

; Filter results
(filter RESULTS (lambda x (match x "timeout" 0)))

; Aggregate
(sum RESULTS)    ; Auto-extracts numbers from lines
(count RESULTS)  ; Count matching items

; Final answer
<<<FINAL>>>13000000<<<END>>>

The Lattice Engine

The Lattice engine (src/logic/) processes Nucleus commands:

  1. Parser (lc-parser.ts) - Parses S-expressions into an AST
  2. Type Inference (type-inference.ts) - Validates types before execution
  3. Constraint Resolver (constraint-resolver.ts) - Handles symbolic constraints like [ฮฃโšกฮผ]
  4. Solver (lc-solver.ts) - Executes commands against the document

Lattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.

In-Memory Handle Storage

For large result sets, RLM uses a handle-based architecture with in-memory SQLite (src/persistence/) that achieves 97%+ token savings:

Traditional:  LLM sees full array    [15,000 tokens for 1000 results]
Handle-based: LLM sees stub          [50 tokens: "$grep_error: Array(1000) [preview...]"]

How it works:

  1. Results are stored in SQLite with FTS5 full-text indexing
  2. LLM receives descriptive handle references derived from the command (e.g., $grep_error, $bm25_timeout, $filter_status)
  3. Operations execute server-side, returning new handles
  4. Full data is only materialized when needed

Handle names are auto-generated from the Nucleus command: (grep "ERROR") produces $grep_error, (list_symbols "function") produces $list_symbols_function. Repeated commands get a numeric suffix ($grep_error_2, $grep_error_3).

Memory Pad

The Lattice engine doubles as a context memory for LLM agents. Instead of roundtripping large text blobs in every message, agents stash context server-side and carry only compact handle stubs:

Agent reads file, summarizes โ†’ lattice_memo "auth architecture"
                              โ†’ $memo_auth_architecture: "auth architecture" (2.1KB, 50 lines)

20 messages later, needs it  โ†’ lattice_expand $memo_auth_architecture
                              โ†’ Full 50-line summary

Token math (30-message session, 3 source files stashed):

  • Traditional roundtripping: 836K tokens
  • Memo-based (stubs + 6 expands): 57K tokens โ€” 93% savings

Memos persist across document loads (lattice_load clears query handles but keeps memos), support LRU eviction (100 memo cap, 10MB budget), and can be explicitly deleted when stale. No document needs to be loaded to use memos.

The Role of the LLM

The LLM does reasoning, not code generation:

  1. Understands intent - Interprets "total of north sales" as needing grep + filter + sum
  2. Chooses operations - Decides which Nucleus commands achieve the goal
  3. Verifies results - Checks if the current results answer the query
  4. Iterates - Refines search if results are too broad or narrow

The LLM never writes JavaScript. It outputs Nucleus commands that Lattice executes safely.

Installation

Install from npm:

npm install -g matryoshka-rlm

Or run without installing:

npx matryoshka-rlm "How many ERROR entries are there?" ./server.log

Included Tools

The package provides several CLI tools:

Command Description
rlm Main CLI for document analysis with LLM reasoning
rlm-mcp MCP server with full RLM + LLM orchestration (analyze_document tool)
lattice-mcp MCP server exposing direct Nucleus commands (no LLM required)
lattice-repl Interactive REPL for Nucleus commands
lattice-http HTTP server for Nucleus queries
lattice-pipe Pipe adapter for programmatic access
lattice-setup Setup script for Claude Code integration

From Source

git clone https://github.com/yogthos/Matryoshka.git
cd Matryoshka
npm install
npm run build

Configuration

Copy config.example.json to config.json and configure your LLM provider:

{
  "llm": {
    "provider": "ollama"
  },
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434",
      "model": "qwen3-coder:30b",
      "options": { "temperature": 0.2, "num_ctx": 8192 }
    },
    "deepseek": {
      "baseUrl": "https://api.deepseek.com",
      "apiKey": "${DEEPSEEK_API_KEY}",
      "model": "deepseek-chat",
      "options": { "temperature": 0.2 }
    }
  },
  "rlm": {
    "maxTurns": 10
  }
}

Usage

CLI

# Basic usage
rlm "How many ERROR entries are there?" ./server.log

# With options
rlm "Count all ERROR entries" ./server.log --max-turns 15 --verbose

# See all options
rlm --help

MCP Integration

RLM includes lattice-mcp, an MCP (Model Context Protocol) server for direct access to the Nucleus engine. This allows coding agents to analyze documents with 80%+ token savings compared to reading files directly.

The key advantage is handle-based results: query results are stored server-side in SQLite, and the agent receives compact stubs like $grep_error: Array(1000) [preview...] instead of full data. Handle names are derived from the command for easy identification. Operations chain server-side without roundtripping data.

Available Tools

Tool Description
lattice_load Load a document for analysis
lattice_query Execute Nucleus commands on the loaded document
lattice_expand Expand a handle to see full data (with optional limit/offset)
lattice_memo Store arbitrary context as a memo handle (no document required)
lattice_memo_delete Delete a stale memo to free memory
lattice_close Close the session and free memory
lattice_status Get session status, document info, and memo usage
lattice_bindings Show current variable bindings and memo labels
lattice_reset Reset all bindings and memos but keep document loaded
lattice_llm_respond Respond to a pending (llm_query ...) suspension
lattice_llm_batch_respond Respond to a pending (llm_batch ...) suspension with all N responses
lattice_help Get Nucleus command reference

Example MCP config

{
  "mcp": {
    "lattice": {
      "type": "stdio",
      "command": "lattice-mcp"
    }
  }
}

Efficient Usage Pattern

1. lattice_load("/path/to/large-file.txt")   # Load document (use for >500 lines)
2. lattice_query('(grep "ERROR")')           # Search โ†’ $grep_error: Array(500) [preview]
3. lattice_query('(filter RESULTS ...)')     # Narrow โ†’ $filter_timeout: Array(50) [preview]
4. lattice_query('(count RESULTS)')          # Count without seeing data โ†’ 50
5. lattice_expand("$filter_timeout", limit=10) # Expand only what you need to see
6. lattice_close()                           # Free memory when done

Token efficiency tips:

  • Query results return descriptive handle stubs, not full data
  • Use lattice_expand with limit to see only what you need
  • Chain grep โ†’ filter โ†’ count/sum to refine progressively
  • Use RESULTS in queries (always points to last result)
  • Use descriptive handle names (e.g., $grep_error) with lattice_expand to inspect specific results

Chunking and Sub-LLM Recursion

Two primitive families power the paper's ฮฉ(|P|ยฒ) semantic-horizon pattern:

Chunking โ€” pre-slice a document that's too big to map over directly:

(chunk_by_size 2000)                ; 2000-character slices
(chunk_by_lines 100)                ; 100-line slices
(chunk_by_regex "\\n\\n")           ; Split on blank lines; capture groups ignored

Sub-LLM calls โ€” (llm_query ...) invokes a sub-LLM with an interpolated prompt. Works at the top level and nested inside map / filter / reduce lambdas:

(llm_query "Summarize this")                                         ; bare
(llm_query "Classify: {items}" (items RESULTS))                      ; with binding
(map (chunk_by_lines 100)
     (lambda c (llm_query "summarize: {chunk}" (chunk c))))           ; OOLONG
(filter RESULTS (lambda x (match (llm_query "keep?: {item}" (item x)) "keep" 0)))

The last two patterns fire one sub-LLM call per item โ€” classification or summarization over an entire document, one chunk at a time, without pulling any of it into the root model's context.

Batched sub-LLM โ€” when per-item calls are independent, llm_batch collapses N serial suspensions into one:

(llm_batch RESULTS (lambda x (llm_query "tag: {item}" (item x))))

Same surface syntax as map + llm_query, but fires a single [LLM_BATCH_REQUEST id=... count=N] suspension. The client replies once with a JSON array of N responses via lattice_llm_batch_respond. ~92% round-trip reduction on N=12, ~99% on N=100.

Constrain responses with (one_of ...) for classification tasks:

(llm_batch RESULTS
  (lambda x (llm_query "Rate: {item}" (item x)
                       (one_of "low" "medium" "high"))))

Validates responses case-insensitively against the allowed values, making downstream (filter ...) / (count ...) reliable without re-normalizing free-text output.

Add (calibrate) for subjective-judgment tasks:

(llm_batch RESULTS
  (lambda x (llm_query "Rate: {item}" (item x)
                       (one_of "low" "medium" "high")
                       (calibrate))))

Asks the model to scan all N prompts and establish a consistent relative scale before answering any. Useful when ratings depend on the distribution of the corpus rather than being absolute.

Multi-turn suspension protocol (works with any MCP client):

When (llm_query ...) is evaluated, execution suspends and returns a [LLM_QUERY_REQUEST id=...] message. The MCP client responds via lattice_llm_respond to resume execution. For queries with multiple llm_query calls (e.g., inside map), each item triggers one suspension โ€” respond to each in turn until the final handle stub or scalar is returned. No special client capabilities (like sampling) are required.

For the native recursive sub-RLM implementation, use runRLMFromContent(query, content, { subRLMMaxDepth: 1 }) directly from the programmatic API โ€” see the Programmatic section below.

Memory Pad Usage

1. lattice_memo(content="<file summary>", label="auth module")  โ†’ $memo_auth_module stub
2. lattice_memo(content="<analysis>", label="perf bottlenecks") โ†’ $memo_perf_bottlenecks stub
3. # ... many turns later, need the auth context ...
4. lattice_expand("$memo_auth_module")                          โ†’ Full summary
5. lattice_memo_delete("$memo_auth_module")                     โ†’ Drop when stale

Memos don't require a loaded document โ€” they create a session automatically. Limits: 100 memos, 10MB total. Oldest evicted when exceeded.

Programmatic

import { runRLM } from "matryoshka-rlm/rlm";
import { createLLMClient } from "matryoshka-rlm";

const llmClient = createLLMClient("ollama", {
  baseUrl: "http://localhost:11434",
  model: "qwen3-coder:30b",
  options: { temperature: 0.2 }
});

const result = await runRLM("How many ERROR entries are there?", "./server.log", {
  llmClient,
  maxTurns: 10,
  turnTimeoutMs: 30000,
});

Nucleus DSL Reference

Search Commands

(grep "pattern")              ; Regex search, returns matches with line numbers
(fuzzy_search "query" 10)     ; Fuzzy search, returns top N matches with scores
(bm25 "query terms" 10)      ; BM25 ranked keyword search (TF-IDF scoring)
(semantic "query terms" 10)   ; TF-IDF cosine similarity search
(text_stats)                  ; Document metadata (length, line count, samples)
(lines 10 20)                 ; Get specific line range (1-indexed)

Multi-Signal Fusion & Ranking

Combine results from multiple search operations for better relevance:

;; Reciprocal Rank Fusion โ€” merge results from different search signals
(fuse (grep "ERROR") (bm25 "error handling") (semantic "failure"))

;; Gravity dampening โ€” halve scores for false positives lacking query term overlap
(dampen (bm25 "database error") "database error")

;; Q-value reranking โ€” learns which lines are useful across turns
(rerank (fuse (grep "ERROR") (bm25 "error")))

;; Full pipeline: fuse โ†’ dampen โ†’ rerank
(rerank (dampen (fuse (grep "ERROR") (bm25 "error") (semantic "failure")) "error"))

Symbol Operations (Code Files)

For code files, Lattice uses tree-sitter to extract structural symbols. This enables code-aware queries that understand functions, classes, methods, and other language constructs.

Built-in languages (packages included):

  • TypeScript (.ts, .tsx), JavaScript (.js, .jsx), Python (.py), Go (.go)
  • HTML (.html), CSS (.css), JSON (.json)

Additional languages (install package to enable):

  • Rust, C, C++, Java, Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Bash, SQL, and more
(list_symbols)                ; List all symbols (functions, classes, methods, etc.)
(list_symbols "function")     ; Filter by kind: "function", "class", "method", "interface", "type", "struct"
(get_symbol_body "myFunc")    ; Get source code body for a symbol by name
(get_symbol_body RESULTS)     ; Get body for symbol from previous query result
(find_references "myFunc")    ; Find all references to an identifier

Symbols include metadata like name, kind, start/end lines, and parent relationships (e.g., methods within classes).

Knowledge Graph (Code Structure)

When a code file is loaded, Lattice automatically builds an in-memory knowledge graph that tracks call relationships, inheritance, and interface implementations. This enables structural queries beyond simple text search.

(callers "funcName")            ; Who calls this function?
(callees "funcName")            ; What does this function call?
(ancestors "ClassName")         ; Inheritance chain (extends)
(descendants "ClassName")       ; All subclasses (transitive)
(implementations "IFace")       ; Classes implementing this interface
(dependents "name")             ; All transitive dependents
(dependents "name" 2)           ; Dependents within depth limit
(symbol_graph "name" 1)         ; Neighborhood subgraph around symbol

The graph is built using line-based heuristics (word-boundary matching for calls, syntax pattern matching for extends/implements), so it produces approximate but useful results without requiring a full language server.

Graph Analysis

Community detection and structural insights help you understand codebase architecture:

(communities)                   ; Detect communities with cohesion scores
(community_of "name")           ; Which community does this symbol belong to?
(god_nodes)                     ; Top 10 most-connected nodes (hubs)
(god_nodes 5)                   ; Top N most-connected nodes
(surprising_connections)         ; Cross-community or low-confidence edges
(bridge_nodes)                  ; Nodes bridging different communities
(suggest_questions)             ; Questions the graph can answer
(graph_report)                  ; Full analysis (all of the above)

Adding Language Support

Matryoshka includes built-in symbol mappings for 20+ languages. To enable a language, install its tree-sitter grammar package:

# Enable Rust support
npm install tree-sitter-rust

# Enable Java support
npm install tree-sitter-java

# Enable Ruby support
npm install tree-sitter-ruby

Languages with built-in mappings:

  • TypeScript, JavaScript, Python, Go, Rust, C, C++, Java
  • Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Elixir
  • HTML, CSS, JSON, YAML, TOML, Markdown, SQL, Bash

Once a package is installed, the language is automatically available for symbol extraction.

Custom Language Configuration

For languages without built-in mappings, create ~/.matryoshka/config.json mapping tree-sitter node types to symbol kinds (function, method, class, interface, type, struct, enum, trait, module, variable, constant, property):

{
  "grammars": {
    "ocaml": {
      "package": "tree-sitter-ocaml",
      "extensions": [".ml", ".mli"],
      "moduleExport": "ocaml",
      "symbols": {
        "value_definition": "function",
        "type_definition": "type",
        "module_definition": "module"
      }
    }
  }
}

Use the tree-sitter playground to explore node types for your language.

Control Flow

(if (count RESULTS) (sum RESULTS) 0)  ; Conditional: if/then/else
(add 10 20)                           ; Arithmetic addition

Collection Operations

(filter RESULTS (lambda x (match x "pattern" 0)))  ; Filter by regex
(map RESULTS (lambda x (match x "(\\d+)" 1)))      ; Extract from each
(sum RESULTS)                                       ; Sum numbers in results
(count RESULTS)                                     ; Count items

String Operations

(match str "pattern" 0)       ; Regex match, return group N
(replace str "from" "to")     ; String replacement
(split str "," 0)             ; Split and get index
(parseInt str)                ; Parse integer
(parseFloat str)              ; Parse float

Type Coercion

(parseDate "Jan 15, 2024")           ; -> "2024-01-15"
(parseDate "01/15/2024" "US")        ; -> "2024-01-15" (MM/DD/YYYY)
(parseCurrency "$1,234.56")          ; -> 1234.56
(parseNumber "1,234,567")            ; -> 1234567
(coerce value "date")                ; General coercion (date/currency/number/boolean/string)
(extract str "\\$[\\d,]+" 0 "currency")  ; Extract and coerce in one step

Program Synthesis

The model provides constraints (input/output examples), not code โ€” the synthesizer builds programs automatically using Barliman-style relational synthesis with miniKanren.

; Synthesize from input/output pairs
(synthesize
  ("$100" 100)
  ("$1,234" 1234)
  ("$50,000" 50000))

; Named functions โ€” synthesize once, apply many times
(define-fn "parse_price" (("$100" 100) ("$1,234" 1234)))
(apply-fn "parse_price" "$50,000")    ; -> 50000

; Boolean classifiers from examples
(predicate "is_error" (("ERROR: timeout" true) ("INFO: ok" false)))

Cross-Turn State

Results from previous turns are available:

  • RESULTS - Latest array result (updated by grep, filter)
  • _1, _2, _3, ... - Results from specific turns (1-indexed)

Final Answer

<<<FINAL>>>your answer here<<<END>>>

Development

npm test                              # Run tests
npm test -- --coverage                # With coverage
RUN_E2E=1 npm test -- tests/e2e.test.ts  # E2E tests (requires Ollama)
npm run build                         # Build
npm run typecheck                     # Type check

Acknowledgements

This project incorporates ideas and code from:

  • Ori-Mnemos - A persistent memory infrastructure for AI agents implementing the Recursive Memory Harness framework. BM25 search, Reciprocal Rank Fusion, gravity dampening, and Q-value learning reranking were ported from Ori-Mnemos and adapted for line-based document analysis.
  • Nucleus - A symbolic S-expression language by Michael Whitford. RLM uses Nucleus syntax for the constrained DSL that the LLM outputs, providing a rigid grammar that reduces model errors.
  • ramo - A miniKanren implementation in TypeScript by Will Lewis. Used for constraint-based program synthesis.
  • Barliman - A prototype smart editor by William Byrd and Greg Rosenblatt that uses program synthesis to assist programmers. The Barliman-style approach of providing input/output constraints instead of code inspired the synthesis workflow.
  • tree-sitter - A parser generator tool and incremental parsing library. Used for extracting structural symbols (functions, classes, methods) from code files to enable code-aware queries.

License

Apache-2.0

References

Release History

VersionChangesUrgencyDate
main@2026-05-17Latest activity on main branchHigh5/17/2026
v0.2.9Latest release: v0.2.9High4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

studioOpen-source control plane for your AI agents. Connect tools, hire agents, track every token and dollarv2.396.1
mcp-ts-coreAgent-native TypeScript framework for building MCP servers. Build tools, not infrastructure.v0.10.0
tekmetric-mcp๐Ÿ” Ask questions about your shop data in natural language and get instant answers about appointments, customers, and repair orders with Tekmetric MCP.main@2026-06-05
exa-mcp-serverExa MCP for web search and web crawling!main@2026-06-04
dbt-mcpA MCP (Model Context Protocol) server for interacting with dbt.v1.20.0

More from yogthos

chiasmusChiasmus is an MCP server that gives language models access to formal verification

More in MCP Servers

PlanExeCreate a plan from a description in minutes
agentroveYour own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, GitHub Copilot, OpenRouter), custom skills, and MCP servers.
ProxmoxMCP-PlusEnhanced Proxmox MCP server with advanced virtualization management and full OpenAPI integration.
node9-proxyThe Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.