freshcrate
Skin:/
Home > Testing > little-coder

little-coder

A coding agent optimized to smaller LLMs

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

A coding agent optimized to smaller LLMs

README

little-coder

A Claude Code-inspired CLI coding agent, heavily optimized for small models that run on any modern consumer laptop.

little-coder takes the architecture of a cloud-powered coding assistant and makes it work with 5โ€“25 GB local models served via Ollama or llama.cpp, through skill-augmented tool use, domain-knowledge injection, workspace-aware context discovery, a Write-vs-Edit tool invariant, and a thinking-budget system that prevents reasoning models from hanging while preserving their partial insights.

Headline result: ollama/qwen3.5 (9.7B, 6.6 GB) + little-coder scores 45.56% mean (across two full runs) on the full 225-exercise Aider Polyglot benchmark, running on a consumer laptop with no network calls. On the public leaderboard that sits above gpt-4.5-preview (44.9%) and gpt-oss-120b high (41.8%). A matched-model vanilla Aider baseline reaches 19.11%.

The full narrative โ€” motivation, design, methodology, results, leaderboard comparison, integrity audit, and limitations โ€” is in the white paper at https://itayinbarr.substack.com/p/honey-i-shrunk-the-coding-agent This README is the quick tour: what it looks like, how to run it, and how the repo is laid out. For anything about why the design is the way it is or what the numbers mean, read the paper.


What it looks like

little-coder startup banner

Every time you're about to type, the status line shows how much context you've burned and projects how many more messages you can send before a new session is recommended. Zones at 70% (yellow) and 85% (red) match the threshold that triggers automatic compaction:

context usage and session counter

When you ask little-coder to implement something, the agent uses the workspace-awareness skill to discover any spec file (.docs/instructions.md, AGENTS.md, CLAUDE.md, README.md), reads the stub, and then Edits it in place. On the occasions it tries to Write over an existing file, the tool-level guard refuses and hands the agent the exact Edit recipe for the same path:

tool-use flow

Write guard firing and the Edit recovery

All four screenshots are real Rich-rendered SVG exports regenerated from a local generator script โ€” they update in sync with the codebase.


Quick start

Option A โ€” Ollama (simplest)

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a model
ollama pull qwen3.5

# 3. Clone + install little-coder
git clone https://github.com/itayinbarr/little-coder.git
cd little-coder
pip install -e .

# 4. Run
python little_coder.py
# Then in the REPL:  /model ollama/qwen3.5

Option B โ€” llama.cpp (fastest, supports MoE models like Qwen3.6-35B-A3B)

# 1. Build llama.cpp with CUDA (sm_XXX matches your GPU; Blackwell = 120)
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=120 -DLLAMA_CURL=ON
cmake --build build --config Release -j

# 2. Fetch a GGUF (example: Qwen3.6-35B-A3B Q4_K_M, 22 GB)
pip install -U "huggingface_hub[cli]"
hf download unsloth/Qwen3.6-35B-A3B-GGUF Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \
   --local-dir ~/models

# 3. Serve it (MoE trick: keep experts in RAM, attention on GPU)
build/bin/llama-server -m ~/models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \
   --host 127.0.0.1 --port 8888 --jinja \
   -c 16384 -ngl 99 --n-cpu-moe 999 --flash-attn on

# 4. Point little-coder at it
cd /path/to/little-coder && pip install -e .
python little_coder.py --model llamacpp/qwen3.6-35b-a3b
# or for the 9B backup:  --model llamacpp/qwen3.5-9b

Set LLAMACPP_BASE_URL=http://localhost:8888/v1 if you run the server on a different host or port.


Supported models

via llama.cpp (new in v0.0.3)

Model Size Notes
Qwen3.6-35B-A3B 22 GB (Q4_K_M) Sparse MoE, 35B total / 3B active โ€” runs at ~38 tok/s on an 8 GB laptop GPU with --n-cpu-moe 999. Passes tasks that Qwen3.5 9B fails (e.g. book-store) on the first attempt.
Qwen3.5-9B 5.3 GB (Q4_K_M) Dense 9.7B, same model used for the v0.0.2 headline benchmark.

via Ollama

Model Size Notes
Qwen3.5 (default) 6.6 GB 9.7B, thinking + tools, the model the headline 45.56% is from
Gemma4:e4b 9.6 GB 8B, vision + audio capable
Qwen3:8b 5.2 GB 8.2B, thinking + tools
Gemma3:4b 3.3 GB 4B, 8K context, needs all optimizations
Llama 3.2:3b ~2 GB 3B, tight context
Phi4-mini ~3 GB 16K context
Any cloud model โ€” Claude, GPT-4, Gemini โ€” small-model optimizations auto-disabled

CLI reference

python little_coder.py [options]
  --model MODEL        Set the model (e.g. ollama/qwen3.5)
  --permission-mode    auto | accept-all | manual | plan

Key slash commands

Command Description
/model <name> Switch model
/context Show current context usage + message projection
/compact Summarize old messages to free up context
/commit Review and commit changes
/review Code review with structured feedback
/skills List available skills
/memory View persistent memories
/voice Voice input mode
/help Full command reference

Repo layout

little_coder.py          # REPL, slash commands, rendering
agent.py                 # Core agent loop with small-model adaptations
providers.py             # Multi-provider streaming (Ollama, llama.cpp, Anthropic, OpenAI-compat)
tools.py                 # 8 core tools + Write-vs-Edit invariant
tool_registry.py         # Tool registration and dispatch
context.py               # System prompt builder (base + skills + knowledge)
config.py                # Configuration management
compaction.py            # Context window management
workspace.py             # Workspace introspection helpers
memory.py                # Persistent file-based memory

local/                   # Small-model preprocessing pipeline
โ”œโ”€โ”€ config.py            # Per-model profiles (context, tokens, budgets)
โ”œโ”€โ”€ skill_augment.py     # Tool-skill selection and injection
โ”œโ”€โ”€ knowledge_augment.py # Domain-knowledge selection and injection
โ”œโ”€โ”€ context_manager.py   # Prompt compression and message pruning
โ”œโ”€โ”€ quality.py           # Empty / hallucinated / looped response detection
โ”œโ”€โ”€ output_parser.py     # Text-based tool-call extraction + JSON repair
โ””โ”€โ”€ deliberate.py        # Parallel reasoning branches

skill/
โ”œโ”€โ”€ tools/               # Tool usage guidance (8 files)
โ”œโ”€โ”€ knowledge/           # Algorithm + domain reference (13 files)
โ”œโ”€โ”€ loader.py            # Skill file parser
โ”œโ”€โ”€ executor.py          # Skill execution (inline/fork)
โ””โ”€โ”€ builtin.py           # Built-in slash skills

benchmarks/
โ”œโ”€โ”€ aider_polyglot.py              # Multi-language benchmark harness
โ”œโ”€โ”€ polyglot_status.py             # Status dashboard for running benchmarks
โ”œโ”€โ”€ smoke_test_langs.sh            # Reference-solution smoke test per language
โ””โ”€โ”€ results_full_polyglot*.json    # Per-exercise results from full runs

Further reading

  • docs/whitepaper.md โ€” the white paper. Motivation, design philosophy (intern, not senior engineer), methodology, full results, leaderboard comparison, integrity audit, limitations. Start here.
  • docs/benchmark-reproduction.md โ€” two-run reproduction report with per-language statistics, tool-use analysis, intervention metrics, and the runner-degradation investigation.
  • docs/benchmark-baseline-aider.md โ€” vanilla Aider + Qwen3.5 baseline (19.1%) for scaffold-ablation comparison.
  • docs/architecture.md โ€” deep internals for contributors: module dependency graph, tool registry API, skill loader structure.

Citation

If you reference little-coder or its Aider Polyglot result in academic work, please cite the white paper:

@misc{inbar2026littlecoder,
  title        = {little-coder: A Coding Agent Optimized for Small Local Language Models},
  subtitle     = {Architectural Adaptation Lets a 9.7B Model Outperform Frontier Models on Aider Polyglot},
  author       = {Inbar, Itay},
  year         = {2026},
  month        = apr,
  howpublished = {\url{https://github.com/itayinbarr/little-coder/blob/main/docs/whitepaper.md}},
  note         = {White paper}
}

Plain-text form:

Inbar, I. (2026). little-coder: A Coding Agent Optimized for Small Local Language Models. White paper. https://github.com/itayinbarr/little-coder/blob/main/docs/whitepaper.md


Attribution

little-coder is a derivative work based on CheetahClaws / ClawSpring by SafeRL-Lab, licensed under Apache 2.0. The upstream project provided the foundational agent architecture, tool system, multi-provider support, and REPL interface.

little-coder adds significant new systems for small-model optimization: skill-augmented tool use, domain-knowledge injection, workspace awareness, thinking-budget enforcement with reasoning reuse, the Write-vs-Edit tool invariant, model-specific profiles for Qwen3.5 and Gemma4, and a full multi-language benchmark harness.


License

Apache 2.0 โ€” see LICENSE for details.

Release History

VersionChangesUrgencyDate
v1.8.2 ### Fixed - **Minimal user `models.json` entries no longer crash startup with `Cannot read properties of undefined (reading 'input')`** ([#36](https://github.com/itayinbarr/little-coder/issues/36)). The shipped `models.json` declares every field โ€” `id`, `name`, `reasoning`, `input`, `contextWindow`, `maxTokens`, `cost` โ€” but a user override that omitted e.g. `name`/`maxTokens`/`cost` was passed through unchanged to pi's registry, which then exploded deep in `applyModelOverride` when it tried toHigh5/30/2026
v1.8.1 ### Fixed - **`glob` no longer exhausts memory on a recursive search from a huge root.** The tool capped *matches* at 500 but never bounded the *walk*: run from a home directory (or any tree with macOS `Library`, caches, or `node_modules`), `fs.glob` recursively descended everything and its internal traversal state grew until the Node **process** ran out of heap โ€” a host-memory crash (`Ineffective mark-compacts near heap limit`), entirely distinct from the model's *context window* (the read-guaHigh5/23/2026
v1.4.3 Follow-up to v1.4.2: clean up two cosmetic regressions that the @earendil-works scope migration surfaced. ### Fixed - **Pi's `What's New` block no longer appears inside little-coder's TUI after a version bump.** Root cause: pi's interactive mode reads its own bundled `CHANGELOG.md` on startup and renders every entry strictly newer than the `lastChangelogVersion` field in `~/.pi/agent/settings.json` (`interactive-mode.js:getChangelogForDisplay`). v1.4.2 jumped the bundled pi from 0.68.1 to 0.75High5/19/2026
v1.2.0 Issue-cleanup release that also ships built-in LM Studio support. Closes [#17](https://github.com/itayinbarr/little-coder/issues/17) (Windows), [#19](https://github.com/itayinbarr/little-coder/issues/19) (phantom Agent tool), [#21](https://github.com/itayinbarr/little-coder/issues/21) (skill param mismatch). ### Added - **Built-in `lmstudio/local-model` provider.** [LM Studio](https://lmstudio.ai/) exposes an OpenAI-compatible server on `http://127.0.0.1:1234/v1` by default, and previously theHigh5/13/2026
v1.1.0 Issue-cleanup release. Three small features and one bug fix, driven by GitHub issues #12 / #13 / #15 / #16. ### Added - **`models.json` is now the canonical provider registration.** ([#13](https://github.com/itayinbarr/little-coder/issues/13)) Previously `.pi/extensions/llama-cpp-provider/index.ts` hardcoded the model list and `models.json` was decorative; editing it had no effect. Now the extension loads providers and models from `models.json` at startup and registers them dynamically. **UsHigh5/3/2026
v1.0.3README and install.sh now lead with `little-coder --model llamacpp/qwen3.6-35b-a3b` as the canonical example. That's the configuration little-coder is tuned for: small local model + custom scaffolding. Cloud models (Anthropic, OpenAI) move into the secondary list. No code changes โ€” purely a docs change. ## Update ``` npm install -g little-coder@1.0.3 ``` (Or wait for the in-launcher prompt next time you run `little-coder`.)High4/28/2026
main@2026-04-23Latest activity on main branchHigh4/23/2026
v0.0.4Latest release: v0.0.4High4/21/2026
main@2026-04-21Latest activity on main branchHigh4/21/2026
v0.0.3Tag v0.0.3High4/21/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

tsunamiautonomous AI agent that builds full-stack apps. local models. no cloud. no API keys. runs on your hardware.main@2026-04-28
vector-db-benchmarkFramework for benchmarking vector search enginesmaster@2026-06-05
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.v4.1.0
ISC-BenchInternal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.v0.0.6
OpenClawProBenchOpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.main@2026-05-19

More in Testing

vector-db-benchmarkFramework for benchmarking vector search engines
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v
llm_context_benchmarks ๐Ÿ“Š LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz