A deterministic development harness for AI agents. The MCP engine controls workflow execution (step ordering, gates, loops, branches). The agent handles creativity. Every step is enforced, measured, and auditable.
Works with just Claude. Optionally adds Codex and Gemini for multi-agent consensus.
/plugin marketplace add 5uck1ess/marketplace
/plugin install devkit@5uck1ess-pluginsAuto-updates are enabled by default. Devkit updates itself when you restart Claude Code.
These enable tri:* commands (tri-review, tri-debug, tri-security, etc.) to run Claude + Codex + Gemini in parallel.
# Codex plugin
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex
# Gemini plugin
/plugin marketplace add abiswas97/gemini-plugin-cc
/plugin install gemini@abiswas97-geminiIf plugins aren't installed, the CLI fallbacks work too:
brew install codex gemini-cliThese handle concerns devkit doesn't โ methodology, specialized reviews, and context management. No overlap.
# Methodology โ brainstorming, planning, TDD, verification, debugging
/plugin install superpowers@claude-plugins-official
# Specialized review agents โ comment accuracy, type design, silent failures
/plugin install pr-review-toolkit@claude-plugins-official
# Deep feature exploration โ parallel codebase analysis, architecture proposals
/plugin install feature-dev@claude-plugins-official
# Quick commits โ /commit, /commit-push-pr, /clean_gone
/plugin install commit-commands@claude-plugins-official
# Hook creation โ markdown rules, hot reload, conversation analysis
/plugin install hookify@claude-plugins-official
# Skill development โ eval/benchmark framework, blind A/B testing
/plugin install skill-creator@claude-plugins-official
# Context window management โ sandboxes large outputs, 98% token savings
/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-modebrew install rtk # Token optimization (60-90% savings on Bash output)
brew install ast-grep # AST-based repo mapping (used by onboard skill)
# Browser automation โ enables scrape (JS-rendered), screenshot, and browser skills
npx playwright install chromiumPlaywright (optional) enables three skills: enhanced scrape for JS-heavy sites, screenshot for page captures, and browser for full automation (clicking, form filling, multi-step flows, codegen). Free and local โ no API keys. Install only the browsers you need (chromium is ~170MB).
/devkit:statusThis shows which CLIs are installed, which agents are available, and which commands are ready.
# These activate automatically โ just ask naturally:
# "write tests for src/parser.ts"
# "generate a changelog"
# "help me understand this codebase"
# "research the best auth library for Node"
# Slash commands for complex workflows:
/tri:review # Multi-agent code review
# Or just describe: "submit a PR", "ship this" โ pr-ready skill auto-activatesDevkit runs as an MCP server inside Claude Code. When a workflow starts, the engine takes control:
devkit_start("research", "best Go testing frameworks")
โ Engine creates session, returns Step 1 + condensed principles
โ Claude executes the step using standard tools
โ Claude calls devkit_advance(session_id)
โ Engine validates, records output, returns Step 2
โ ...repeat until WORKFLOW COMPLETE
Enforcement (runs automatically):
PreToolUse hook โ blocks out-of-step actions during command steps
Stop hook โ prevents session end during active workflows
Why MCP? Claude can't skip steps because the engine controls what comes next. Claude can't call tools that aren't valid for the current step. The engine holds state โ Claude doesn't self-report.
All skills are tab-completable slash commands in current Claude Code. The primary user-facing entry points:
| Command | What it does |
|---|---|
/tri-review |
Code review from 1-3 agents, consolidated report |
/tri-debug |
Independent root-cause analysis from each agent |
/tri-security |
Security audit with severity-ranked consensus |
/devkit:status |
Health check |
/devkit:setup-rules |
Install language-specific coding rules to ~/.claude/rules/ (user-only โ disable-model-invocation prevents auto-trigger) |
Every workflow also has a dedicated slash command: /feature, /bugfix, /audit, /refactor, /pr-ready, /self-*, etc. Tasks like "ship this PR" or "submit a PR" also auto-activate the pr-ready skill via natural language.
All 21 YAML workflows are invoked via the MCP engine. Every workflow has a trigger skill so natural-language keywords dispatch deterministically โ saying "build a feature", "fix this bug", "tri review", or "deep research X" fires the matching skill, which calls devkit_start and the engine takes over.
| Workflow | What it does |
|---|---|
feature |
Brainstorm, plan, implement, test, lint, review |
bugfix |
Reproduce, diagnose, fix, regression test, verify |
refactor |
Analyze smells, plan, restructure, verify nothing broke |
research |
Clarify, decompose, parallel search, corroborate, synthesize |
deep-research |
ACH: hypotheses, disconfirmation, evidence matrix |
self-test |
Run tests, fix failures, repeat until passing |
self-lint |
Run linter, fix violations, repeat until clean |
self-perf |
Benchmark, optimize, repeat until target met |
self-improve |
Run metric, fix issues, repeat until passing |
self-migrate |
Migrate code incrementally with test gate |
self-audit |
Measure codebase, rank improvements by evidence |
autoloop |
Autonomous audit/fix/measure/keep-or-revert loop |
audit |
Dependencies, vulnerabilities, licenses, lint, security |
pr-ready |
Full PR preparation pipeline |
tri-review |
Multi-agent code review |
tri-debug |
Multi-agent debugging |
tri-security |
Multi-agent security audit |
tri-dispatch |
Send any task to multiple agents |
test-gen |
Generate tests via test-writer agent, iterate until passing |
doc-gen |
Generate docs via documenter agent |
onboard |
Generate codebase onboarding guide via researcher agent |
Skills activate automatically based on context. No slash command needed. Every workflow has a matching trigger skill โ saying the keyword dispatches to the engine which then enforces every step.
Workflow trigger skills (dispatch to engine-enforced workflows):
| Trigger | Skill โ Workflow |
|---|---|
| "build a feature", "new feature X" | feature |
| "fix this bug", "this is broken" | bugfix |
| "refactor this", "clean up X" | refactor |
| "audit this project", "project health" | audit |
| "research X" | research |
| "deep research", "validate this" | deep-research |
| "make a PR", "ship this", "create a pull request" | pr-ready |
| "tri review", "triple review" | tri-review |
| "tri debug", "triple debug" | tri-debug |
| "tri security", "triple security audit" | tri-security |
| "tri dispatch", "send to three models" | tri-dispatch |
| "self-audit", "audit the codebase" | self-audit |
| "self-improve", "keep fixing until X passes" | self-improve |
| "self-lint", "fix all lint" | self-lint |
| "self-migrate", "migrate incrementally" | self-migrate |
| "self-perf", "optimize performance" | self-perf |
| "self-test", "fix failing tests" | self-test |
| "autoloop", "run experiments overnight" | autoloop |
| "write tests for X" | test-gen |
| "document this module" | doc-gen |
| "onboard to this codebase" | onboard |
Other skills (tools, meta-orchestration, content):
| Trigger | Skill |
|---|---|
| "generate a changelog" | changelog |
| "create an ADR" | adr |
| "mega PR review" | mega-pr (dispatches tri-review + pr-review-toolkit in parallel) |
| "scrape this URL" | scrape |
| "screenshot this page" | screenshot (requires Playwright) |
| "automate this browser flow" | browser (requires Playwright) |
| Google Workspace CLI commands | gcli |
Coding principles (clean-code, dry, yagni, dont-reinvent, executing, stuck, scratchpad) are injected as condensed rules (~120 tokens) per workflow step โ not loaded as full skill files.
12 hooks across 4 lifecycle events. All installed automatically with the plugin.
| Event | Hook | What it catches |
|---|---|---|
| PreToolUse | safety-check | rm -rf /, DROP TABLE, force push, editing secrets |
| PreToolUse | security-patterns | eval(), XSS, shell injection, weak hashes, hardcoded secrets |
| PreToolUse | audit-trail | Logs every command to .devkit/audit.log |
| PreToolUse | pr-gate | Prompts to run the pr-ready skill before gh pr create |
| PreToolUse | rtk-rewrite | Compresses Bash output via RTK (no-op if not installed) |
| PreToolUse | devkit-guard | Blocks out-of-step tools during workflow command AND prompt steps (hard enforce); soft enforce emits a reminder. Skills are intentionally unguarded. |
| PostToolUse | post-validate | Suppressed errors, leaked secrets, writes outside repo |
| PostToolUse | slop-detect | AI code patterns โ doc/code imbalance, restating comments |
| PostToolUse | lang-review | Language-aware checks: Go, TypeScript, Rust, Python, Shell |
| SubagentStop | subagent-stop | Verifies subagent work before accepting |
| Stop | stop-gate | Merge conflicts, cross-domain test gaps, linter pass |
| Stop | devkit-stop-guard | Blocks session end during active workflows |
| Agent | Model | Used by |
|---|---|---|
reviewer |
Opus | tri-review workflow, feature workflow |
researcher |
Sonnet | research, deep-research, tri-debug workflows |
improver |
Opus | self-improve, self-lint, self-perf, refactor workflows |
test-writer |
Sonnet | self-test, tri-test-gen workflows |
documenter |
Haiku | doc-gen skill |
security-auditor |
Opus | tri-security, pr-ready, audit workflows |
All agents run in worktree isolation.
Language-specific rules that auto-activate when Claude reads matching files. Installed to ~/.claude/rules/ โ rules guide how to write, hooks catch what you missed.
/devkit:setup-rules| Language | Examples |
|---|---|
| Go | Error wrapping, context.Context, defer traps, JSON float64 gotcha |
| TypeScript | unknown not any, discriminated unions, catch narrowing |
| Python | Exception chains, type hints, dataclasses, pathlib |
| Rust | Ownership, ? propagation, newtypes, clippy-as-errors |
| Shell | set -euo pipefail, quoting, macOS portability |
MCP Server (bin/devkit mcp โ auto-started by plugin)
โโโ bin/devkit = POSIX shell wrapper (committed to git)
โ โโโ On first run, downloads matching release asset from GitHub,
โ verifies SHA256, caches as bin/devkit-engine-v<ver>-<os>-<arch>,
โ then execs it. Local dev builds (make install-plugin) are used
โ directly via the fast path.
โโโ Tools: devkit_start, devkit_advance, devkit_status, devkit_list
โโโ State: session.json (hot, <50ms reads) + SQLite (cold history)
โโโ Parse YAML โ validate steps, branches, budget
โโโ Walk steps:
โ โโโ Command steps โ engine executes shell directly ($0 cost)
โ โ Values passed via $DEVKIT_INPUT / $DEVKIT_OUT_<step_id>
โ โ env vars โ never interpolated into the command string.
โ โโโ Prompt steps โ Claude works, calls devkit_advance when done
โ โโโ Loop with gate โ run, verify, keep or revert
โ โโโ Branch โ case-insensitive word-boundary match โ goto
โ โโโ Parallel โ Agent tool dispatch (Claude/Codex/Gemini)
โโโ Principles injected per step (~120 tokens, not full skill files)
Enforcement:
โโโ MCP tool scoping โ Claude can only call devkit_advance to progress
โโโ PreToolUse hook โ exit 2 blocks tools during command steps
โโโ Stop hook โ blocks session end during active workflows
Terminal usage (devkit workflow <name> "<description>"):
โโโ Subprocess runners for Codex/Gemini CLI usage
devkit/
โโโ commands/ # Legacy (references/ only); new entry points go in skills/
โโโ skills/ # 38 skills (workflow triggers, principles, tools, utilities) + _principles.yml
โโโ agents/ # 6 agents (reviewer, researcher, improver, ...)
โโโ hooks/ # 12 hooks (safety, security, quality gates, workflow enforcement)
โโโ workflows/ # 21 YAML workflow definitions
โโโ resources/rules/ # Language-specific coding rules
โโโ src/ # Go engine + MCP server
โ โโโ mcp/ # MCP server (tools, principles loader, session management)
โ โโโ engine/ # YAML workflow engine (parser, executor, tests)
โ โโโ runners/ # Codex, Gemini interfaces (terminal fallback)
โ โโโ lib/ # DB, git, metrics, session state, reporting
โ โโโ cmd/ # CLI entry points (including `devkit mcp`)
โโโ bin/ # devkit wrapper (committed) + downloaded engine binaries (gitignored)
โโโ .github/workflows/ # CI (build+test+vet) + auto-release (6 platforms)
