Home > MCP Servers > headroom

headroom

The Context Optimization Layer for LLM Applications

agent ai anthropic compression context-engineering context-window fastapi langchain mcp python

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

The Context Optimization Layer for LLM Applications

README

Headroom

Compress everything your AI agent reads. Same answers, fraction of the tokens.

Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal — losslessly, locally, and without touching accuracy.

100 logs. One FATAL error buried at position 67. Both runs found it. Baseline 10,144 tokens → Headroom 1,260 tokens — 87% fewer, identical answer. python examples/needle_in_haystack_test.py

Quick start

Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.

Wrap your coding agent — one command:

pip install "headroom-ai[all]"

headroom wrap claude      # Claude Code
headroom wrap codex       # Codex
headroom wrap cursor      # Cursor
headroom wrap aider       # Aider
headroom wrap copilot     # GitHub Copilot CLI

Drop it into your own code — Python or TypeScript:

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });

Or run it as a proxy — zero code changes, any language:

headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

Why Headroom

Accuracy-preserving. GSM8K 0.870 → 0.870 (±0.000). TruthfulQA +0.030. SQuAD v2 and BFCL both 97% accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
Runs on your machine. No cloud API, no data egress. Compression latency is milliseconds — faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
Kompress-base on HuggingFace. Our open-source text compressor, fine-tuned on real agentic traces — tool outputs, logs, RAG chunks, code. Install with pip install "headroom-ai[ml]".
Cross-agent memory and learning. Claude Code saves a fact, Codex reads it back. headroom learn mines failed sessions and writes corrections straight to CLAUDE.md / AGENTS.md / GEMINI.md — reliability compounds over time.
Reversible (CCR). Compression is not deletion. The model can always call headroom_retrieve to pull the original bytes. Nothing is thrown away.

Bundles the RTK binary for shell-output rewriting — full attribution below.

How it fits

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ───────────────────────────────────────────────   │
    │  CacheAligner  →  ContentRouter  →  CCR             │
    │                    ├─ SmartCrusher   (JSON)         │
    │                    ├─ CodeCompressor (AST)          │
    │                    └─ Kompress-base  (text, HF)     │
    │                                                     │
    │  Cross-agent memory  ·  headroom learn  ·  MCP      │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

→ Architecture · CCR reversible compression · Kompress-base model card

Proof

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%	19% compression
BFCL	Tools	100	—	97%	32% compression

Reproduce:

python -m headroom.evals suite --tier 1

Community, live:

60B+ tokens saved — community leaderboard

60B+ tokens saved by the community in the last 20 days — live leaderboard →

→ Full benchmarks & methodology

Built for coding agents

Agent	One-command wrap	Notes
Claude Code	`headroom wrap claude`	`--memory` for cross-agent memory, `--code-graph` for codebase intel
Codex	`headroom wrap codex --memory`	Shares the same memory store as Claude
Cursor	`headroom wrap cursor`	Prints Cursor config — paste once, done
Aider	`headroom wrap aider`	Starts proxy, launches Aider
Copilot CLI	`headroom wrap copilot`	Starts proxy, launches Copilot
OpenClaw	`headroom wrap openclaw`	Installs Headroom as ContextEngine plugin

MCP-native too — headroom mcp install exposes headroom_compress, headroom_retrieve, and headroom_stats to any MCP client.

Integrations

Drop Headroom into any stack

Your setup	Hook in with
Any Python app	`compress(messages, model=…)`
Any TypeScript app	`await compress(messages, { model })`
Anthropic / OpenAI SDK	`withHeadroom(new Anthropic())` · `withHeadroom(new OpenAI())`
Vercel AI SDK	`wrapLanguageModel({ model, middleware: headroomMiddleware() })`
LiteLLM	`litellm.callbacks = [HeadroomCallback()]`
LangChain	`HeadroomChatModel(your_llm)`
Agno	`HeadroomAgnoModel(your_model)`
Strands	Strands guide
ASGI apps	`app.add_middleware(CompressionMiddleware)`
Multi-agent	`SharedContext().put / .get`
MCP clients	`headroom mcp install`

What's inside

SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
Kompress-base — our HuggingFace model, trained on agentic traces.
Image compression — 40–90% reduction via trained ML router.
CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
IntelligentContext — score-based context fitting with learned importance.
CCR — reversible compression; LLM retrieves originals on demand.
Cross-agent memory — shared store, agent provenance, auto-dedup.
SharedContext — compressed context passing across multi-agent workflows.
headroom learn — plugin-based failure mining for Claude, Codex, Gemini.

Install

pip install "headroom-ai[all]"          # Python, everything
npm  install headroom-ai                # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.

→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

Documentation

Start here	Go deeper
Quickstart	Architecture
Proxy	How compression works
MCP tools	CCR — reversible compression
Memory	Cache optimization
Failure learning	Benchmarks
Configuration	Limitations

Compared to

Headroom runs locally, covers every content type (not just CLI or text), works with every major framework, and is reversible.

	Scope	Deploy	Local	Reversible
Headroom	All context — tools, RAG, logs, files, history	Proxy · library · middleware · MCP	Yes	Yes
RTK	CLI command outputs	CLI wrapper	Yes	No
Compresr, Token Co.	Text sent to their API	Hosted API call	No	No
OpenAI Compaction	Conversation history	Provider-native	No	No

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git show → git show --short, noisy ls → scoped, chatty installers → summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.

Community

Live leaderboard — 60B+ tokens saved and counting.
Discord — questions, feedback, war stories.
Kompress-base on HuggingFace — the model behind our text compression.

License

Apache 2.0 — see LICENSE.

Release History

Version	Changes	Urgency	Date
v0.32.0	## What's Changed * ci(release-please): use a PAT so releases trigger the publish workflows by @chopratejas in https://github.com/headroomlabs-ai/headroom/pull/1920 * ci: bump actions/checkout from 4 to 7 by @dependabot[bot] in https://github.com/headroomlabs-ai/headroom/pull/1414 * fix(memory/sync): make Codex AGENTS.md adapter additive (stop wiping memories) by @abhay-codes07 in https://github.com/headroomlabs-ai/headroom/pull/1674 * fix(proxy): honor x-headroom-base-url on /v1/messages route	High	7/17/2026
v0.29.0	## What's Changed * fix(opencode): route native providers + load transport plugin, fix Serena context by @chopratejas in https://github.com/headroomlabs-ai/headroom/pull/1573 * fix(pricing): resolve MiniMax-M3 (provider prefix + pre-registration) by @shreyassks in https://github.com/headroomlabs-ai/headroom/pull/1186 * fix(learn): aggregate verbosity baselines across projects instead of overwriting by @gglucass in https://github.com/headroomlabs-ai/headroom/pull/1288 * fix: preserve anthropic pa	High	7/3/2026
v0.27.0	## [0.27.0](https://github.com/chopratejas/headroom/compare/v0.26.0...v0.27.0) (2026-06-22) ### Features * cli: add headroom doctor setup diagnostics ([#926](https://github.com/chopratejas/headroom/issues/926)) ([e45cf4e](https://github.com/chopratejas/headroom/commit/e45cf4e0618b4de02608f68c502ac4cf1270eb84)) * cli: add headroom update command and release banner ([#1088](https://github.com/chopratejas/headroom/issues/1088)) ([26be2c3](https://github.com/chopratejas/headroom/commit/26	High	6/22/2026
v0.26.0	## [0.26.0](https://github.com/chopratejas/headroom/compare/v0.25.0...v0.26.0) (2026-06-16) ### Features * add Copilot BYOK provider wrapper utilities and CLI support ([#1041](https://github.com/chopratejas/headroom/issues/1041)) ([e67ee2a](https://github.com/chopratejas/headroom/commit/e67ee2af658bce35fb4c71b45a0c5b294d7dcfdc)) * add dashboard agent usage stats ([#814](https://github.com/chopratejas/headroom/issues/814)) ([6d3f39f](https://github.com/chopratejas/headroom/commit/6d3f39f213f4e	High	6/16/2026
v0.25.0	## [0.25.0](https://github.com/chopratejas/headroom/compare/v0.24.0...v0.25.0) (2026-06-12) ### Features * add differential network capture harness ([#761](https://github.com/chopratejas/headroom/issues/761)) ([11ab5f8](https://github.com/chopratejas/headroom/commit/11ab5f83a1ccd617a2608349a42feff7f7e72b98)) * add light mode for dashboard ([#834](https://github.com/chopratejas/headroom/issues/834)) ([c425893](https://github.com/chopratejas/headroom/commit/c425893d123e67c62ee20ff64ae350eb4ea56	High	6/12/2026
v0.24.0	## [0.24.0](https://github.com/chopratejas/headroom/compare/v0.23.0...v0.24.0) (2026-06-08) ### Features * perf: add --format {text,json,csv} to `headroom perf` ([#648](https://github.com/chopratejas/headroom/issues/648)) ([9fe4886](https://github.com/chopratejas/headroom/commit/9fe4886cf6b612452f7271d3204872f804074c1f)) * proxy: show resolved upstream API targets in startup banner ([#586](https://github.com/chopratejas/headroom/issues/586)) ([8dbe7ad](https://github.com/chopratejas/h	High	6/9/2026
v0.22.4	## What's Changed * fix(cli): wrap CLI breadth — cline, continue, goose, openhands by @chopratejas in https://github.com/chopratejas/headroom/pull/492 * fix(subscription): wire tokens_saved_rtk data plane by @chopratejas in https://github.com/chopratejas/headroom/pull/493 * fix(observability): RTK metrics + Rust observability (Phase H blocker) by @chopratejas in https://github.com/chopratejas/headroom/pull/494 * ci(release): adopt release-please for gated publishes by @chopratejas in https://git	High	6/1/2026
v0.22.2	## [0.22.2] - 2026-05-20 ### Bug Fixes - memory: expose memory IDs in auto-tail + memory_list tool + ID-usage guidance (f844f64)	High	5/20/2026
v0.21.36	## [0.21.36] - 2026-05-14 ### Documentation - improve discoverability for AI agents and search crawlers (bcf5517)	High	5/14/2026
v0.21.6	## [0.21.6] - 2026-05-08 ### Bug Fixes - ci: update sdist license-packaging invariant test to match new shape (cd89a82)	High	5/8/2026
v0.20.10	## [0.20.10] - 2026-05-02 ### Bug Fixes - proxy: cache concurrency lock, multi-worker docs, bounded compre… (51eeaf6)	High	5/2/2026
v0.10.17	## [0.10.17] - 2026-04-26 ### Other Changes - rust stage 3 (a+b): diff_compressor port + retire python via pyo3 + dotenv test hygiene (fcef84f)	High	4/26/2026
v0.9.2	## [0.9.2] - 2026-04-22 ### CI/CD - publish Python distributions to GitHub releases (5738339) ### Other Changes - Sync plugins to 0.9.2, pyproject canonical at 0.9.1 [skip ci] (7ed2b0b)	High	4/22/2026
v0.8.3	## [0.8.3] - 2026-04-21 ### Bug Fixes - onnx: reduce retained cpu memory (80920ed)	High	4/21/2026
v0.8.2	## [0.8.2] - 2026-04-21	High	4/21/2026
v0.8.1	## [0.8.1] - 2026-04-21 ### Bug Fixes - telemetry: add headroom_stack and install_mode identity fields (b789dca) ### Documentation - port Docker-native, filesystem-contract, and persistent-installs pages to Fumadocs (724c298)	High	4/21/2026
v0.8.0	## [0.8.0] - 2026-04-21 ### Features - bundle ast-grep/difftastic/scc + generic tool_result interceptor framework (21d909a)	High	4/21/2026
v0.7.4	## [0.7.4] - 2026-04-21 ### Other Changes - Fix OpenClaw GitHub Packages build (043045f)	High	4/21/2026
v0.7.3	## [0.7.3] - 2026-04-21	High	4/21/2026
v0.7.2	## [0.7.2] - 2026-04-21 ### Bug Fixes - complete fork-friendly release publishing (16d4608)	High	4/21/2026
v0.7.1	## [0.7.1] - 2026-04-20 ### Chores - memory: add EXTERNAL backend extension points (5391761)	High	4/20/2026
v0.7.0	## [0.7.0] - 2026-04-20 ### Features - add Pi/Codex and Cloud Code Assist compatibility routes (e6cdc2f)	High	4/20/2026
v0.6.7	## [0.6.7] - 2026-04-20 ### Bug Fixes - release: resolve openclaw dependency and boost test coverage (7495687) - ci: add --allow-same-version to npm version in release workflow (2e20849) - proxy: Codex reconnect-storm resilience — bounded pre-upstream + WS session tracking + stage timings (cd11acc) - publish docker and package artifacts in release flow (eff1616)	High	4/20/2026
v0.6.6	## [0.6.6] - 2026-04-20 ### Chores - proxy: add third-party extension point (3373dab)	High	4/20/2026
v0.6.5	## [0.6.5] - 2026-04-19 ### Other Changes - Update documentation links to new domain (31ea3a1)	High	4/19/2026
v0.6.4	## [0.6.4] - 2026-04-19 ### Bug Fixes - support OAuth Bearer token routing for proxy endpoints (956fe40)	High	4/19/2026
v0.6.3	## [0.6.3] - 2026-04-18 ### Documentation - fix Kompress-base HuggingFace link (e4f3cf7)	High	4/18/2026
v0.6.2	## [0.6.2] - 2026-04-18 ### Performance - add compress_system_messages and min_tokens_to_compress config (3242efe) ### Documentation - rewrite README for clarity and highlight Kompress-base, leaderboard, RTK (2593ff2)	High	4/18/2026
v0.6.1	## [0.6.1] - 2026-04-17	High	4/17/2026
v0.6.0	## [0.6.0] - 2026-04-17 ### Features - wire kompress_model through compress pipeline (1039f66)	High	4/17/2026
v0.5.28	## [0.5.28] - 2026-04-17	High	4/17/2026
v0.5.27	## [0.5.27] - 2026-04-17 ### Bug Fixes - take highest release bump across unreleased commits (9469bcb)	High	4/17/2026
v0.5.26	## [0.5.26] - 2026-04-17 ### Features - canonical HEADROOM_CONFIG_DIR and HEADROOM_WORKSPACE_DIR filesystem contract (cb7bc8c) ### Bug Fixes - run release versioning without package imports (4651f96) - strip accept-encoding from forwarded proxy headers (01fa4aa) - restore semantic release versioning (e2ca3ca)	High	4/17/2026
v0.5.25.3	## [0.5.25.3] - 2026-04-16	High	4/16/2026
v0.5.25.2	## [0.5.25.2] - 2026-04-16	High	4/16/2026
v0.5.25.1	## [0.5.25.1] - 2026-04-16	High	4/16/2026
v0.5.20	Release 0.5.20	High	4/8/2026
v0.5.2	## What's Changed - Fix: Cache-aware cost pricing — Dashboard "Input cost" now uses LiteLLM's native cache pricing (cache reads at 10%, cache writes at 125% for Anthropic) instead of list price for all tokens. Previously overstated input costs significantly. - Dashboard clarity — Added note that cost estimates cover message tokens only (excludes system prompt & tool definitions).	Low	3/20/2026
v0.3.7	## What's New in v0.3.7 ### New Features - any-llm backend — Route requests through 38+ LLM providers (OpenAI, Mistral, Groq, Ollama, etc.) via [any-llm](https://mozilla-ai.github.io/any-llm/providers/) - Enable with `--backend anyllm --anyllm-provider <provider>` - Install with: `pip install 'headroom-ai[anyllm]'` - IntelligentContextManager — Semantic-aware context management with multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, fo	Low	2/24/2026
v0.3.0	## What's New ### Bedrock Backend Fix - Use inference profiles (`us.anthropic.*`) for Claude 4+ models - Add dummy API key support for Claude Code users - Clear setup instructions for VS Code ### CLI Improvements - New Click-based CLI with memory management commands - Commands: `headroom memory list\|show\|edit\|delete\|prune\|purge\|export\|import` ### Cloud Provider Support - AWS Bedrock via LiteLLM - Google Vertex AI support - Azure OpenAI support ### Bug Fixes - Fix asyncio event loop error i	Low	1/31/2026
v0.2.15	## Headroom Demo Video Watch Headroom in action - analyzing its own codebase with Claude Code while demonstrating massive token savings. What you'll see: - Real-time token optimization during multi-tool agent conversations - Intelligent context compression preserving critical information - Significant cost savings when working with large codebases Download the video or watch the README for the embedded demo.	Low	1/20/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

agentroveYour own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, GitHub Copilot, OpenRouter), custom skills, and MCP servers.v0.1.56

developers-guide-to-aiThe Developer's Guide to AI - A Field Guide for the Working Developermain@2026-07-24

fcpxml-mcp-server🎬 The first AI-powered MCP server for Final Cut Pro XML. Control your edits with natural language.v0.13.0

claude-code-plugins-plus-skills423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.@intentsolutionsio/intent-labs-pack@0.1.0

sinain-hudAmbient intelligence that sees what you see, hears what you hear, and acts on your behalfmacos-v0.13.0

More in MCP Servers

supersetCode Editor for the AI Agents Era - Run an army of Claude Code, Codex, etc. on your machine

kreuzbergA polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 91+ formats. Available for Rust, Python

ai-engineering-from-scratchLearn it. Build it. Ship it for others.

CodeGraphContextAn MCP server plus a CLI tool that indexes local code into a graph database to provide context to AI assistants.