Devkit

A deterministic development harness for AI agents. The MCP engine controls workflow execution (step ordering, gates, loops, branches). The agent handles creativity. Every step is enforced, measured, and auditable.

Works with just Claude. Optionally adds Codex and Gemini for multi-agent consensus.

Install

1. Devkit (required)

/plugin marketplace add 5uck1ess/marketplace
/plugin install devkit@5uck1ess-plugins

Auto-updates are enabled by default. Devkit updates itself when you restart Claude Code.

2. Multi-agent plugins (optional)

These enable tri:* commands (tri-review, tri-debug, tri-security, etc.) to run Claude + Codex + Gemini in parallel.

# Codex plugin
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex

# Gemini plugin
/plugin marketplace add abiswas97/gemini-plugin-cc
/plugin install gemini@abiswas97-gemini

If plugins aren't installed, the CLI fallbacks work too:

brew install codex gemini-cli

3. Companion plugins (optional)

These handle concerns devkit doesn't — methodology, specialized reviews, and context management. No overlap.

# Methodology — brainstorming, planning, TDD, verification, debugging
/plugin install superpowers@claude-plugins-official

# Specialized review agents — comment accuracy, type design, silent failures
/plugin install pr-review-toolkit@claude-plugins-official

# Deep feature exploration — parallel codebase analysis, architecture proposals
/plugin install feature-dev@claude-plugins-official

# Quick commits — /commit, /commit-push-pr, /clean_gone
/plugin install commit-commands@claude-plugins-official

# Hook creation — markdown rules, hot reload, conversation analysis
/plugin install hookify@claude-plugins-official

# Skill development — eval/benchmark framework, blind A/B testing
/plugin install skill-creator@claude-plugins-official

# Context window management — sandboxes large outputs, 98% token savings
/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode

4. Optional tools

brew install rtk       # Token optimization (60-90% savings on Bash output)
brew install ast-grep  # AST-based repo mapping (used by onboard skill)

# Browser automation — enables scrape (JS-rendered), screenshot, and browser skills
npx playwright install chromium

Playwright (optional) enables three skills: enhanced scrape for JS-heavy sites, screenshot for page captures, and browser for full automation (clicking, form filling, multi-step flows, codegen). Free and local — no API keys. Install only the browsers you need (chromium is ~170MB).

Verify

/devkit:status

This shows which CLIs are installed, which agents are available, and which commands are ready.

Quick Start

# These activate automatically — just ask naturally:
# "write tests for src/parser.ts"
# "generate a changelog"
# "help me understand this codebase"
# "research the best auth library for Node"

# Slash commands for complex workflows:
/tri:review                   # Multi-agent code review
# Or just describe: "submit a PR", "ship this" → pr-ready skill auto-activates

How It Works

Devkit runs as an MCP server inside Claude Code. When a workflow starts, the engine takes control:

devkit_start("research", "best Go testing frameworks")
  → Engine creates session, returns Step 1 + condensed principles
  → Claude executes the step using standard tools
  → Claude calls devkit_advance(session_id)
  → Engine validates, records output, returns Step 2
  → ...repeat until WORKFLOW COMPLETE

Enforcement (runs automatically):
  PreToolUse hook → blocks out-of-step actions during command steps
  Stop hook → prevents session end during active workflows

Why MCP? Claude can't skip steps because the engine controls what comes next. Claude can't call tools that aren't valid for the current step. The engine holds state — Claude doesn't self-report.

Commands

All skills are tab-completable slash commands in current Claude Code. The primary user-facing entry points:

Command	What it does
`/tri-review`	Code review from 1-3 agents, consolidated report
`/tri-debug`	Independent root-cause analysis from each agent
`/tri-security`	Security audit with severity-ranked consensus
`/devkit:status`	Health check
`/devkit:setup-rules`	Install language-specific coding rules to `~/.claude/rules/` (user-only — `disable-model-invocation` prevents auto-trigger)

Every workflow also has a dedicated slash command: /feature, /bugfix, /audit, /refactor, /pr-ready, /self-*, etc. Tasks like "ship this PR" or "submit a PR" also auto-activate the pr-ready skill via natural language.

Workflows

All 21 YAML workflows are invoked via the MCP engine. Every workflow has a trigger skill so natural-language keywords dispatch deterministically — saying "build a feature", "fix this bug", "tri review", or "deep research X" fires the matching skill, which calls devkit_start and the engine takes over.

Workflow	What it does
`feature`	Brainstorm, plan, implement, test, lint, review
`bugfix`	Reproduce, diagnose, fix, regression test, verify
`refactor`	Analyze smells, plan, restructure, verify nothing broke
`research`	Clarify, decompose, parallel search, corroborate, synthesize
`deep-research`	ACH: hypotheses, disconfirmation, evidence matrix
`self-test`	Run tests, fix failures, repeat until passing
`self-lint`	Run linter, fix violations, repeat until clean
`self-perf`	Benchmark, optimize, repeat until target met
`self-improve`	Run metric, fix issues, repeat until passing
`self-migrate`	Migrate code incrementally with test gate
`self-audit`	Measure codebase, rank improvements by evidence
`autoloop`	Autonomous audit/fix/measure/keep-or-revert loop
`audit`	Dependencies, vulnerabilities, licenses, lint, security
`pr-ready`	Full PR preparation pipeline
`tri-review`	Multi-agent code review
`tri-debug`	Multi-agent debugging
`tri-security`	Multi-agent security audit
`tri-dispatch`	Send any task to multiple agents
`test-gen`	Generate tests via test-writer agent, iterate until passing
`doc-gen`	Generate docs via documenter agent
`onboard`	Generate codebase onboarding guide via researcher agent

Skills

Skills activate automatically based on context. No slash command needed. Every workflow has a matching trigger skill — saying the keyword dispatches to the engine which then enforces every step.

Workflow trigger skills (dispatch to engine-enforced workflows):

Trigger	Skill → Workflow
"build a feature", "new feature X"	`feature`
"fix this bug", "this is broken"	`bugfix`
"refactor this", "clean up X"	`refactor`
"audit this project", "project health"	`audit`
"research X"	`research`
"deep research", "validate this"	`deep-research`
"make a PR", "ship this", "create a pull request"	`pr-ready`
"tri review", "triple review"	`tri-review`
"tri debug", "triple debug"	`tri-debug`
"tri security", "triple security audit"	`tri-security`
"tri dispatch", "send to three models"	`tri-dispatch`
"self-audit", "audit the codebase"	`self-audit`
"self-improve", "keep fixing until X passes"	`self-improve`
"self-lint", "fix all lint"	`self-lint`
"self-migrate", "migrate incrementally"	`self-migrate`
"self-perf", "optimize performance"	`self-perf`
"self-test", "fix failing tests"	`self-test`
"autoloop", "run experiments overnight"	`autoloop`
"write tests for X"	`test-gen`
"document this module"	`doc-gen`
"onboard to this codebase"	`onboard`

Other skills (tools, meta-orchestration, content):

Trigger	Skill
"generate a changelog"	`changelog`
"create an ADR"	`adr`
"mega PR review"	`mega-pr` (dispatches tri-review + pr-review-toolkit in parallel)
"scrape this URL"	`scrape`
"screenshot this page"	`screenshot` (requires Playwright)
"automate this browser flow"	`browser` (requires Playwright)
Google Workspace CLI commands	`gcli`

Coding principles (clean-code, dry, yagni, dont-reinvent, executing, stuck, scratchpad) are injected as condensed rules (~120 tokens) per workflow step — not loaded as full skill files.

Hooks

12 hooks across 4 lifecycle events. All installed automatically with the plugin.

Event	Hook	What it catches
PreToolUse	safety-check	`rm -rf /`, `DROP TABLE`, force push, editing secrets
PreToolUse	security-patterns	`eval()`, XSS, shell injection, weak hashes, hardcoded secrets
PreToolUse	audit-trail	Logs every command to `.devkit/audit.log`
PreToolUse	pr-gate	Prompts to run the pr-ready skill before `gh pr create`
PreToolUse	rtk-rewrite	Compresses Bash output via RTK (no-op if not installed)
PreToolUse	devkit-guard	Blocks out-of-step tools during workflow command AND prompt steps (hard enforce); soft enforce emits a reminder. Skills are intentionally unguarded.
PostToolUse	post-validate	Suppressed errors, leaked secrets, writes outside repo
PostToolUse	slop-detect	AI code patterns — doc/code imbalance, restating comments
PostToolUse	lang-review	Language-aware checks: Go, TypeScript, Rust, Python, Shell
SubagentStop	subagent-stop	Verifies subagent work before accepting
Stop	stop-gate	Merge conflicts, cross-domain test gaps, linter pass
Stop	devkit-stop-guard	Blocks session end during active workflows

Agents

Agent	Model	Used by
`reviewer`	Opus	tri-review workflow, feature workflow
`researcher`	Sonnet	research, deep-research, tri-debug workflows
`improver`	Opus	self-improve, self-lint, self-perf, refactor workflows
`test-writer`	Sonnet	self-test, tri-test-gen workflows
`documenter`	Haiku	doc-gen skill
`security-auditor`	Opus	tri-security, pr-ready, audit workflows

All agents run in worktree isolation.

Coding Rules

Language-specific rules that auto-activate when Claude reads matching files. Installed to ~/.claude/rules/ — rules guide how to write, hooks catch what you missed.

/devkit:setup-rules

Language	Examples
Go	Error wrapping, context.Context, defer traps, JSON float64 gotcha
TypeScript	`unknown` not `any`, discriminated unions, catch narrowing
Python	Exception chains, type hints, dataclasses, pathlib
Rust	Ownership, `?` propagation, newtypes, clippy-as-errors
Shell	`set -euo pipefail`, quoting, macOS portability

Architecture

MCP Server (bin/devkit mcp — auto-started by plugin)
  ├── bin/devkit = POSIX shell wrapper (committed to git)
  │   └── On first run, downloads matching release asset from GitHub,
  │       verifies SHA256, caches as bin/devkit-engine-v<ver>-<os>-<arch>,
  │       then execs it. Local dev builds (make install-plugin) are used
  │       directly via the fast path.
  ├── Tools: devkit_start, devkit_advance, devkit_status, devkit_list
  ├── State: session.json (hot, <50ms reads) + SQLite (cold history)
  ├── Parse YAML → validate steps, branches, budget
  ├── Walk steps:
  │   ├── Command steps → engine executes shell directly ($0 cost)
  │   │   Values passed via $DEVKIT_INPUT / $DEVKIT_OUT_<step_id>
  │   │   env vars — never interpolated into the command string.
  │   ├── Prompt steps → Claude works, calls devkit_advance when done
  │   ├── Loop with gate → run, verify, keep or revert
  │   ├── Branch → case-insensitive word-boundary match → goto
  │   └── Parallel → Agent tool dispatch (Claude/Codex/Gemini)
  └── Principles injected per step (~120 tokens, not full skill files)

Enforcement:
  ├── MCP tool scoping — Claude can only call devkit_advance to progress
  ├── PreToolUse hook — exit 2 blocks tools during command steps
  └── Stop hook — blocks session end during active workflows

Terminal usage (devkit workflow <name> "<description>"):
  └── Subprocess runners for Codex/Gemini CLI usage

Repository Structure

devkit/
├── commands/          # Legacy (references/ only); new entry points go in skills/
├── skills/            # 38 skills (workflow triggers, principles, tools, utilities) + _principles.yml
├── agents/            # 6 agents (reviewer, researcher, improver, ...)
├── hooks/             # 12 hooks (safety, security, quality gates, workflow enforcement)
├── workflows/         # 21 YAML workflow definitions
├── resources/rules/   # Language-specific coding rules
├── src/               # Go engine + MCP server
│   ├── mcp/           # MCP server (tools, principles loader, session management)
│   ├── engine/        # YAML workflow engine (parser, executor, tests)
│   ├── runners/       # Codex, Gemini interfaces (terminal fallback)
│   ├── lib/           # DB, git, metrics, session state, reporting
│   └── cmd/           # CLI entry points (including `devkit mcp`)
├── bin/               # devkit wrapper (committed) + downloaded engine binaries (gitignored)
└── .github/workflows/ # CI (build+test+vet) + auto-release (6 platforms)

Version	Changes	Urgency	Date
v2.1.36	Merged: ci(release): bump mcpb/manifest.json in lockstep + rebuild bundle	High	5/22/2026
v2.1.32	Merged: test(mcp): add stdout regression test for devkit mcp	High	5/19/2026
v2.1.29	Merged: feat(cmd): add approve subcommand for workflow gates	High	4/18/2026
v2.1.28	Merged: Probe-local tri-review follow-ups: Status enum + shared probe + hint/test/contract fixes	High	4/17/2026
v2.1.27	Merged: feat: local runner health probe	High	4/17/2026
v2.1.26	Merged: fix(guard): cross-repo scope + companion-rescue Bash for tri-review	High	4/17/2026
v2.1.25	Merged: fix(guard): allow Agent/Task dispatch on prompt+hard steps	High	4/17/2026
v2.1.24	Merged: fix: scope stop-guard to originating repo + force tri-* dispatch	High	4/15/2026
v2.1.23	Merged: feat: local runner + stealth scraping backends (Camoufox, Scweet)	High	4/15/2026
v2.1.22	Merged: feat: harness-audit workflow + expanded language rules	High	4/13/2026
v2.1.21	Merged: fix(skills): remove unquoted ": " from 10 SKILL.md descriptions	Medium	4/11/2026
v2.1.20	Merged: fix: populate agent bodies, sync mcpb version, update layout docs	High	4/11/2026
v2.1.19	Merged: fix(readme): standardize bare slash commands, rename status→health	Medium	4/11/2026
v2.1.18	Merged: refactor(engine): enforce type design for EnforceMode (closes #81)	High	4/11/2026
v2.1.17	Merged: docs: add CLAUDE.md as token-efficient navigation map	Medium	4/11/2026
v2.1.16	Merged: feat(engine): per-step enforce override + surgical soft-flips (#78)	Medium	4/11/2026
v2.1.15	Merged: docs(skills): description-quality pass across 13 collision-prone skills	Medium	4/11/2026
v2.1.14	Merged: refactor(plugin): migrate commands/ to skills/	Medium	4/11/2026
v2.1.13	Merged: fix(hooks): resolve #73 #74 #75 follow-ups (test harness, path handling, set -euo)	Medium	4/11/2026
v2.1.12	Merged: fix(hooks): three silent-failure bugs found during hook audit	Medium	4/11/2026
v2.1.11	Merged: feat(skills): deterministic dispatch for every devkit workflow	Medium	4/11/2026
v2.1.10	Merged: fix(workflows): tri-review + tri-security enforce: soft	Medium	4/11/2026
v2.1.9	Merged: fix(hooks): native devkit-engine guard subcommand (closes #65)	Medium	4/11/2026
v2.1.8	Merged: fix(hooks): enforce workflow progression on prompt steps + orphan recovery	Medium	4/11/2026
v2.1.7	Merged: fix(windows): real Go launcher for devkit MCPB bundle (closes #60)	Medium	4/10/2026
v2.1.6	Merged: bin/devkit: fix Windows first-run install (curl+schannel bug, #58)	Medium	4/10/2026
v2.1.5	Merged: hooks: auto-ignore .devkit/ in host repos on first run	Medium	4/10/2026
v2.1.4	Merged: feat(pr-ready): add doc-check step to the workflow	Medium	4/10/2026
v2.1.3	Merged: fix(wrapper): make engine download crash-safe and resumable	Medium	4/10/2026
v2.1.2	Merged: fix: bootstrap engine binary on first run via committed wrapper	Medium	4/10/2026
v2.1.1	Merged: feat: Playwright skills (screenshot, browser) + scrape backend	Medium	4/10/2026
v2.1.0	Merged: MCP engine: deterministic workflow enforcement via tool scoping + hooks	Medium	4/10/2026
v2.0.39	Merged: Remove docs/specs from public repo	Medium	4/10/2026
v2.0.38	Merged: Pass version tag to binary build for correct devkit --version	Medium	4/10/2026
v2.0.37	Merged: Fix publish job: cd to /tmp broke gh release upload	Medium	4/10/2026
v2.0.36	Merged: Add changelog for deterministic workflow conversion	Medium	4/10/2026
v2.0.35	Merged: Fix stale command references in README and status.md	Medium	4/10/2026
v2.0.34	Merged: PR 6: Add expect field to engine for command step assertions	Medium	4/10/2026
v2.0.33	Merged: PR 5: Trim commands — delete 16, keep 8 entry points	Medium	4/9/2026
v2.0.32	Merged: Add mega-pr skill for combined parallel PR review	Medium	4/9/2026
v2.0.31	Merged: PR 4: Convert remaining 12 commands to thin wrappers	Medium	4/9/2026
v2.0.30	Merged: PR 3: Convert bugfix and feature commands to thin wrappers	Medium	4/9/2026
v2.0.29	Merged: PR 2: Convert self-improvement loops to deterministic command+gate	Medium	4/9/2026
v2.0.28	Merged: PR 1: Convert research skills to deterministic YAML wrappers	Medium	4/9/2026
v2.0.27	Merged: Fix marketplace install to use github shorthand for Update Now support	Medium	4/9/2026
v2.0.26	Merged: Add ERR trap logging and migrate to [[ ]] in stop-gate.sh	Medium	4/9/2026
v2.0.25	Merged: Fix stop-gate infinite loop on large TS projects + README rewrite	Medium	4/9/2026
v2.0.24	Merged: Add deterministic command steps and loop gates to workflow engine	Medium	4/9/2026
v2.0.23	Merged: Add domain probes, stub detection, and symptom triage	Medium	4/8/2026
v2.0.22	Merged: Supplement commands with autoresearch-inspired patterns	Medium	4/7/2026
v2.0.21	Merged: Fix tri-agent failures on large diffs	Medium	4/7/2026
v2.0.20	Merged: Add setup-rules command and coding rule reference files	Medium	4/6/2026

devkit

Description

README