Official mascot design by Matias Mesa.
The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.
23 built-in tools β’ 8 LLM providers β’ 5-tier memory β’ 24/7 autonomous β’ $0 local mode
ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, DeepSeek, Kimi via Moonshot, and Z.AI GLM) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, use cloud API keys when you need them, or connect native OpenAI Codex subscription auth and Z.AI Coding Plan inside the ForgeGod CLI.
pip install forgegodAudit note (re-verified 2026-04-08): the verified baseline now includes
23registered tools,8provider families,9route surfaces,503collected tests,418non-stress tests passing by default plus1opt-in Docker strict integration test,84/84stress tests passing, green lint, and a green build.forgegod loopno longer auto-commits or auto-pushes by default. Read docs/AUDIT_2026-04-07.md, docs/OPERATIONS.md, and docs/WEB_RESEARCH_2026-04-07.md before making runtime changes.
Every other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.
| Capability | Claude Code | Codex CLI | Aider | Cursor | ForgeGod |
|---|---|---|---|---|---|
| Multi-model auto-routing | - | - | manual | - | yes |
| Local + cloud hybrid | - | basic | basic | - | native |
| 24/7 autonomous loops | - | - | - | - | yes |
| Cross-session memory | basic | - | - | removed | 5-tier |
| Self-improving strategy | - | - | - | - | yes (SICA) |
| Cost-aware budget modes | - | - | - | - | yes |
| Reflexion code generation | - | - | - | - | 3-attempt |
| Parallel git worktrees | subagents | - | - | - | experimental |
| Stress tested + benchmarked | - | - | - | - | audited baseline |
Scaffolding adds ~11 points on SWE-bench β harness engineering matters as much as the model. ForgeGod is the harness:
- Ralph Loop β 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
- 5-Tier Memory β Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
- Reflexion Coder β 3-attempt code gen with escalating models: local (free) β cloud (cheap) β frontier (when it matters). The repo now wires workspace scoping, command auditing, blocked paths, and generated-code warnings into runtime, while the audit tracks the remaining hardening gaps.
- DESIGN.md Native β Import a design preset, drop
DESIGN.mdin repo root, and frontend tasks inherit that design language automatically. - Contribution Mode β Read
CONTRIBUTING.md, inspect the repo, surface approachable issues, and plan or execute contribution-sized changes with repo-specific guardrails. - SICA β Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. Safety guardrails and audit policy keep that loop honest.
- Budget Modes β
normalβthrottleβlocal-onlyβhalt. Auto-triggered by spend. Run forever on Ollama for $0.
You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.
- Install Ollama: https://ollama.com/download
- Pull a model:
ollama pull qwen3.5:9b - Install ForgeGod:
pip install forgegod - Run:
forgegod init(interactive wizard guides you) - Try it:
forgegod run "Create a simple website with a contact form"
- Install ForgeGod:
pip install forgegod - Run:
forgegod auth login openai-codex - Run:
forgegod auth sync - Try it:
forgegod plan "Build a REST API with user authentication"
ForgeGod stays the entrypoint. It delegates the one-time login to the official Codex auth flow, then keeps day-to-day usage inside ForgeGod CLI.
- Export
ZAI_CODING_API_KEY=... - Install ForgeGod:
pip install forgegod - Run:
forgegod auth sync - Try it:
forgegod run "Build a REST API with user authentication"
For the strongest current subscription-backed setup inside ForgeGod, use:
planner = zai:glm-5.1researcher = zai:glm-5.1coder = zai:glm-5.1reviewer = openai-codex:gpt-5.4sentinel = openai-codex:gpt-5.4escalation = openai-codex:gpt-5.4
See docs/GLM_CODEX_HARNESS_2026-04-08.md,
docs/examples/glm_codex_coding_plan.toml,
and run python scripts/smoke_glm_codex_harness.py before high-stakes use.
This harness is research-backed and works in ForgeGod today. The ZAI_CODING_API_KEY
path should still be treated as experimental and at-your-own-risk until Z.AI
explicitly recognizes ForgeGod as a supported coding tool.
Run forgegod doctor β it checks your setup and tells you exactly what to fix.
If you want the real strict sandbox, read
docs/STRICT_SANDBOX_SETUP.md.
It explains Docker Desktop, the required sandbox image, and the safe fix path
in non-technical terms.
# Install
pip install forgegod
# Initialize a project
forgegod init
# Check native auth surfaces
forgegod auth status
# Link ChatGPT-backed OpenAI Codex subscription, then sync config defaults
forgegod auth login openai-codex
forgegod auth sync
# Single task
forgegod run "Add a /health endpoint to server.py with uptime and version info"
# Plan a project β generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"
# 24/7 autonomous loop from PRD
# Loop defaults: no auto-commit or auto-push unless you explicitly enable those flags
# Parallel workers require a git repo with at least one commit because ForgeGod uses isolated worktrees
forgegod loop --prd .forgegod/prd.json
# Caveman mode β 50-75% token savings with ultra-terse prompts
forgegod run --terse "Add a /health endpoint"
# Check what it learned
forgegod memory
# View cost breakdown
forgegod cost
# Benchmark your models
forgegod benchmark
# Install a DESIGN.md preset for frontend work
forgegod design pull claude
# Plan a contribution against another repo
forgegod contribute https://github.com/owner/repo --goal "Improve tests"
# Health check
forgegod doctorForgeGod auto-detects your environment on first run:
- Finds API keys in env vars (
OPENAI_API_KEY,ANTHROPIC_API_KEY,OPENROUTER_API_KEY,GOOGLE_API_KEY/GEMINI_API_KEY,DEEPSEEK_API_KEY,MOONSHOT_API_KEY,ZAI_CODING_API_KEY,ZAI_API_KEY) and detects native OpenAI Codex login state - Checks if Ollama is running locally
- Detects your project language, test framework, and linter
- Picks auth-aware model defaults for each role based on what's available
- Creates
.forgegod/config.tomlwith sensible defaults
No manual setup required. Just run forgegod init and go.
If you add a new provider later, run forgegod auth sync to rewrite model defaults from detected auth surfaces.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β RALPH LOOP β
β β
β ββββββββ βββββββββ βββββββββββ βββββββ β
β β READ ββββΆβ SPAWN ββββΆβ EXECUTE ββββΆβ VAL β β
β β PRD β β AGENT β β STORY β βIDATEβ β
β ββββββββ βββββββββ βββββββββββ ββββ¬βββ β
β β² β β
β β ββββββββββ ββββββββββ β β
β βββββββββββROTATE ββββββCOMMIT βββββ β
β βCONTEXT β βOR RETRYβ pass β
β ββββββββββ ββββββββββ β
β β
β Progress is in GIT, not LLM context. β
β Fresh agent per story. No context rot. β
β Create .forgegod/KILLSWITCH to stop. β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
- Read PRD β Pick highest-priority TODO story
- Spawn agent β Fresh context (progress is in git, not memory)
- Execute β Agent uses 23 tools to implement the story
- Validate β Tests, lint, syntax, frontier review
- Finalize or retry β Pass: review diff + mark done. Fail: retry up to 3x with model escalation
- Rotate β Next story. Context is always fresh.
ForgeGod has the most advanced memory system of any open-source coding agent:
| Tier | What | How | Retention |
|---|---|---|---|
| Episodic | What happened per task | Full outcome records | 90 days |
| Semantic | Extracted principles | Confidence + decay + reinforcement | Indefinite |
| Procedural | Code patterns & fix recipes | Success rate tracking | Indefinite |
| Graph | Entity relationships + causal edges | Auto-extracted from outcomes | Indefinite |
| Error-Solution | Error pattern β fix mapping | Fuzzy match lookup | Indefinite |
Memories decay with category-specific half-life (14d debugging β 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.
# Check memory health
forgegod memory
# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)| Mode | Behavior | Trigger |
|---|---|---|
normal |
Use all configured models | Default |
throttle |
Prefer local, cloud for review only | 80% of daily limit |
local-only |
Ollama only, $0 operation | Manual or 95% limit |
halt |
Stop all LLM calls | 100% of daily limit |
# Check spend
forgegod cost
# Override mode
export FORGEGOD_BUDGET_MODE=local-onlyUltra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:
- Mini-SWE-Agent β 100 lines, >74% SWE-bench Verified
- Chain of Draft β 7.6% tokens, same accuracy
- CCoT β 48.7% shorter, negligible impact
# Add --terse to any command
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"
# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = trueCaveman mode compresses system prompts (~200 β ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks β last frame only). JSON schemas for planner/reviewer stay byte-identical.
ForgeGod uses TOML config with 3-level priority: env vars > project > global.
Fresh forgegod init and forgegod auth sync write auth-aware defaults. The example below shows the file shape, not the only valid mapping.
# .forgegod/config.toml
[models]
planner = "openai:gpt-4o-mini" # Cheap planning
coder = "ollama:qwen3-coder-next" # Free local coding
reviewer = "openai:o4-mini" # Quality gate
sentinel = "openai:gpt-4o" # Frontier sampling
escalation = "openai:gpt-4o" # Fallback for hard problems
[budget]
daily_limit_usd = 5.00
mode = "normal"
[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true
[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"
[terse]
enabled = false # --terse flag or set true here
[security]
sandbox_mode = "standard" # permissive | standard | strict
sandbox_backend = "auto" # auto | docker
sandbox_image = "mcr.microsoft.com/devcontainers/python:1-3.13-bookworm"
redact_secrets = true
audit_commands = trueexport OPENAI_API_KEY="sk-..."
forgegod auth login openai-codex # Native ChatGPT-backed OpenAI auth
export ANTHROPIC_API_KEY="sk-ant-..." # Optional
export OPENROUTER_API_KEY="sk-or-..." # Optional
export GOOGLE_API_KEY="AIza..." # Optional (Gemini)
export DEEPSEEK_API_KEY="sk-..." # Optional
export MOONSHOT_API_KEY="sk-..." # Optional (Kimi / Moonshot)
export ZAI_CODING_API_KEY="..." # Optional (Z.AI Coding Plan)
export ZAI_API_KEY="..." # Optional (Z.AI general API)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10| Provider | Models | Cost | Setup |
|---|---|---|---|
| Ollama | qwen3-coder-next, devstral, any | $0 | ollama serve |
| OpenAI API | gpt-4o, gpt-4o-mini, o3, o4-mini | $$ | OPENAI_API_KEY |
| OpenAI Codex subscription | gpt-5.4 via Codex auth surface | Included in supported ChatGPT plans | forgegod auth login openai-codex |
| Anthropic | claude-sonnet-4-6, claude-opus-4-6 | $$$ | ANTHROPIC_API_KEY |
| Google Gemini | gemini-2.5-pro, gemini-3-flash | $$ | GOOGLE_API_KEY |
| DeepSeek | deepseek-chat, deepseek-reasoner | $ | DEEPSEEK_API_KEY |
| Kimi (Moonshot direct) | kimi-k2.5, kimi-k2-thinking | $$ | MOONSHOT_API_KEY |
| Z.AI / GLM | glm-5.1, glm-5, glm-4.7 | $$ | ZAI_CODING_API_KEY or ZAI_API_KEY |
| OpenRouter | 200+ models | varies | OPENROUTER_API_KEY |
Kimi support uses Moonshot's official OpenAI-compatible API and is currently experimental in ForgeGod. Benchmark it on your workload before making it a default role. OpenAI Codex subscription support is strongest today for planner/reviewer/adversary flows. It also works as a ForgeGod route surface for coding, but coder-loop use remains experimental and should be benchmarked before you make it the default remote coder. OpenRouter still uses keys/credits. Alibaba/Qwen Coding Plan is still under evaluation because current official docs scope it to supported coding tools rather than generic autonomous loops.
Run your own: forgegod benchmark
| Model | Composite | Correctness | Quality | Speed | Cost | Self-Repair |
|---|---|---|---|---|---|---|
| openai:gpt-4o-mini | 81.5 | 10/12 | 7.4 | 12s avg | $0.08 | 4/4 |
| ollama:qwen3.5:9b | 72.3 | 8/12 | 6.8 | 45s avg | $0.00 | 3/4 |
Run forgegod benchmark --update-readme to refresh with your own results.
forgegod/
βββ cli.py # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
βββ config.py # TOML config + env vars + 3-level priority
βββ router.py # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
βββ agent.py # Core agent loop (tools + context compression + sub-agents)
βββ coder.py # Reflexion code generation (3 attempts, model escalation, GOAP)
βββ loop.py # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
βββ planner.py # Task decomposition β PRD
βββ reviewer.py # Frontier model quality gate (sample-based)
βββ sica.py # Self-improving strategy modification (guardrails + audit policy)
βββ memory.py # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
βββ budget.py # SQLite cost + token tracking, forecasting, auto budget modes
βββ worktree.py # Parallel git worktree workers
βββ tui.py # Rich terminal dashboard
βββ terse.py # Caveman mode β terse prompts, tool compression, savings tracker
οΏ½οΏ½οΏ½ββ benchmark.py # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
βββ onboarding.py # Interactive setup wizard for new users
βββ doctor.py # Installation health check (6 diagnostic checks)
βββ i18n.py # Translation strings (English + Spanish es-419)
βββ models.py # Pydantic v2 data models
βββ tools/
βββ filesystem.py # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
βββ shell.py # bash (isolated runtime env + strict command policy + secret redaction)
βββ git.py # git status, diff, commit, worktrees
βββ mcp.py # MCP server client (5,800+ servers)
βββ skills.py # On-demand skill loading
Defense-in-depth, not security theater:
- Real strict sandbox β
strictruns inside Docker with no network, read-only rootfs, dropped caps, and workspace-only mounts - Standard shell policy β
standardkeeps the local guardrails: isolated runtime dirs, blocked shell operators, and workspace scoping - Secret redaction β 11 patterns strip API keys from tool output before LLM context
- Prompt injection detection β 8 patterns scan for jailbreak/role-override attempts
- AST code validation β Detects obfuscated dangerous calls (
getattr(os, 'system')) that regex misses, and blocks suspicious writes instrictmode - Workspace-scoped file ops β file and shell tools reject paths that escape the active workspace root
- Supply chain defense β Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
- Canary token system β Detects if system prompt leaks into tool arguments, with per-session rotation
- Budget limits β Cost controls with token tracking + burn-rate forecasting
- Killswitch β Create
.forgegod/KILLSWITCHto immediately halt autonomous loops - Sensitive file protection β
.env, credentials files get warnings + automatic redaction
Warning: ForgeGod executes shell commands and modifies files. As of the verified 2026-04-08 baseline,
strictuses a real Docker sandbox backend and blocks if Docker/image prerequisites are missing, whilestandardremains a host-local guarded workflow. Useforgegod doctorand docs/STRICT_SANDBOX_SETUP.md instead of weakening the sandbox just to get past setup friction.
- AGENTS.md β repo-local instructions for coding agents
- docs/OPERATIONS.md β current system of record and verified commands
- docs/AUDIT_2026-04-07.md β detailed code audit and remediation order
- docs/WEB_RESEARCH_2026-04-07.md β external guidance used to shape the repo docs
See SECURITY.md for the full policy and vulnerability reporting.
We welcome contributions. See CONTRIBUTING.md for guidelines.
- Bug reports and feature requests: GitHub Issues
- Questions and discussion: GitHub Discussions
ForgeGod credits code and non-code work in public.
- Matias Mesa -
design- official ForgeGod mascot system - WAITDEAD -
code,infra,research,projectManagement,maintenance
See CONTRIBUTORS.md for the current contributor list.
Apache 2.0 β see LICENSE.
Built by WAITDEAD β’ Official mascot design by Matias Mesa β’ Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.

