freshcrate
Skin:/
Home > MCP Servers > devkit

devkit

A deterministic development harness for Claude Code โ€” MCP workflow engine, enforcement hooks, YAML workflows, and multi-agent consensus (Claude + Codex + Gemini)

Why this rank:Recent releaseHealthy release cadenceStrong adoption

Description

A deterministic development harness for Claude Code โ€” MCP workflow engine, enforcement hooks, YAML workflows, and multi-agent consensus (Claude + Codex + Gemini)

README

Devkit

A deterministic development harness for AI agents. The MCP engine controls workflow execution (step ordering, gates, loops, branches). The agent handles creativity. Every step is enforced, measured, and auditable.

Works with just Claude. Optionally adds Codex and Gemini for multi-agent consensus.


Install

1. Devkit (required)

/plugin marketplace add 5uck1ess/marketplace
/plugin install devkit@5uck1ess-plugins

Auto-updates are enabled by default. Devkit updates itself when you restart Claude Code.

2. Multi-agent plugins (optional)

These enable tri:* commands (tri-review, tri-debug, tri-security, etc.) to run Claude + Codex + Gemini in parallel.

# Codex plugin
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex

# Gemini plugin
/plugin marketplace add abiswas97/gemini-plugin-cc
/plugin install gemini@abiswas97-gemini

If plugins aren't installed, the CLI fallbacks work too:

brew install codex gemini-cli

3. Companion plugins (optional)

These handle concerns devkit doesn't โ€” methodology, specialized reviews, and context management. No overlap.

# Methodology โ€” brainstorming, planning, TDD, verification, debugging
/plugin install superpowers@claude-plugins-official

# Specialized review agents โ€” comment accuracy, type design, silent failures
/plugin install pr-review-toolkit@claude-plugins-official

# Deep feature exploration โ€” parallel codebase analysis, architecture proposals
/plugin install feature-dev@claude-plugins-official

# Quick commits โ€” /commit, /commit-push-pr, /clean_gone
/plugin install commit-commands@claude-plugins-official

# Hook creation โ€” markdown rules, hot reload, conversation analysis
/plugin install hookify@claude-plugins-official

# Skill development โ€” eval/benchmark framework, blind A/B testing
/plugin install skill-creator@claude-plugins-official

# Context window management โ€” sandboxes large outputs, 98% token savings
/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode

4. Optional tools

brew install rtk       # Token optimization (60-90% savings on Bash output)
brew install ast-grep  # AST-based repo mapping (used by onboard skill)

# Browser automation โ€” enables scrape (JS-rendered), screenshot, and browser skills
npx playwright install chromium

Playwright (optional) enables three skills: enhanced scrape for JS-heavy sites, screenshot for page captures, and browser for full automation (clicking, form filling, multi-step flows, codegen). Free and local โ€” no API keys. Install only the browsers you need (chromium is ~170MB).

Verify

/devkit:status

This shows which CLIs are installed, which agents are available, and which commands are ready.


Quick Start

# These activate automatically โ€” just ask naturally:
# "write tests for src/parser.ts"
# "generate a changelog"
# "help me understand this codebase"
# "research the best auth library for Node"

# Slash commands for complex workflows:
/tri:review                   # Multi-agent code review
# Or just describe: "submit a PR", "ship this" โ†’ pr-ready skill auto-activates

How It Works

Devkit runs as an MCP server inside Claude Code. When a workflow starts, the engine takes control:

devkit_start("research", "best Go testing frameworks")
  โ†’ Engine creates session, returns Step 1 + condensed principles
  โ†’ Claude executes the step using standard tools
  โ†’ Claude calls devkit_advance(session_id)
  โ†’ Engine validates, records output, returns Step 2
  โ†’ ...repeat until WORKFLOW COMPLETE

Enforcement (runs automatically):
  PreToolUse hook โ†’ blocks out-of-step actions during command steps
  Stop hook โ†’ prevents session end during active workflows

Why MCP? Claude can't skip steps because the engine controls what comes next. Claude can't call tools that aren't valid for the current step. The engine holds state โ€” Claude doesn't self-report.


Commands

All skills are tab-completable slash commands in current Claude Code. The primary user-facing entry points:

Command What it does
/tri-review Code review from 1-3 agents, consolidated report
/tri-debug Independent root-cause analysis from each agent
/tri-security Security audit with severity-ranked consensus
/devkit:status Health check
/devkit:setup-rules Install language-specific coding rules to ~/.claude/rules/ (user-only โ€” disable-model-invocation prevents auto-trigger)

Every workflow also has a dedicated slash command: /feature, /bugfix, /audit, /refactor, /pr-ready, /self-*, etc. Tasks like "ship this PR" or "submit a PR" also auto-activate the pr-ready skill via natural language.

Workflows

All 21 YAML workflows are invoked via the MCP engine. Every workflow has a trigger skill so natural-language keywords dispatch deterministically โ€” saying "build a feature", "fix this bug", "tri review", or "deep research X" fires the matching skill, which calls devkit_start and the engine takes over.

Workflow What it does
feature Brainstorm, plan, implement, test, lint, review
bugfix Reproduce, diagnose, fix, regression test, verify
refactor Analyze smells, plan, restructure, verify nothing broke
research Clarify, decompose, parallel search, corroborate, synthesize
deep-research ACH: hypotheses, disconfirmation, evidence matrix
self-test Run tests, fix failures, repeat until passing
self-lint Run linter, fix violations, repeat until clean
self-perf Benchmark, optimize, repeat until target met
self-improve Run metric, fix issues, repeat until passing
self-migrate Migrate code incrementally with test gate
self-audit Measure codebase, rank improvements by evidence
autoloop Autonomous audit/fix/measure/keep-or-revert loop
audit Dependencies, vulnerabilities, licenses, lint, security
pr-ready Full PR preparation pipeline
tri-review Multi-agent code review
tri-debug Multi-agent debugging
tri-security Multi-agent security audit
tri-dispatch Send any task to multiple agents
test-gen Generate tests via test-writer agent, iterate until passing
doc-gen Generate docs via documenter agent
onboard Generate codebase onboarding guide via researcher agent

Skills

Skills activate automatically based on context. No slash command needed. Every workflow has a matching trigger skill โ€” saying the keyword dispatches to the engine which then enforces every step.

Workflow trigger skills (dispatch to engine-enforced workflows):

Trigger Skill โ†’ Workflow
"build a feature", "new feature X" feature
"fix this bug", "this is broken" bugfix
"refactor this", "clean up X" refactor
"audit this project", "project health" audit
"research X" research
"deep research", "validate this" deep-research
"make a PR", "ship this", "create a pull request" pr-ready
"tri review", "triple review" tri-review
"tri debug", "triple debug" tri-debug
"tri security", "triple security audit" tri-security
"tri dispatch", "send to three models" tri-dispatch
"self-audit", "audit the codebase" self-audit
"self-improve", "keep fixing until X passes" self-improve
"self-lint", "fix all lint" self-lint
"self-migrate", "migrate incrementally" self-migrate
"self-perf", "optimize performance" self-perf
"self-test", "fix failing tests" self-test
"autoloop", "run experiments overnight" autoloop
"write tests for X" test-gen
"document this module" doc-gen
"onboard to this codebase" onboard

Other skills (tools, meta-orchestration, content):

Trigger Skill
"generate a changelog" changelog
"create an ADR" adr
"mega PR review" mega-pr (dispatches tri-review + pr-review-toolkit in parallel)
"scrape this URL" scrape
"screenshot this page" screenshot (requires Playwright)
"automate this browser flow" browser (requires Playwright)
Google Workspace CLI commands gcli

Coding principles (clean-code, dry, yagni, dont-reinvent, executing, stuck, scratchpad) are injected as condensed rules (~120 tokens) per workflow step โ€” not loaded as full skill files.


Hooks

12 hooks across 4 lifecycle events. All installed automatically with the plugin.

Event Hook What it catches
PreToolUse safety-check rm -rf /, DROP TABLE, force push, editing secrets
PreToolUse security-patterns eval(), XSS, shell injection, weak hashes, hardcoded secrets
PreToolUse audit-trail Logs every command to .devkit/audit.log
PreToolUse pr-gate Prompts to run the pr-ready skill before gh pr create
PreToolUse rtk-rewrite Compresses Bash output via RTK (no-op if not installed)
PreToolUse devkit-guard Blocks out-of-step tools during workflow command AND prompt steps (hard enforce); soft enforce emits a reminder. Skills are intentionally unguarded.
PostToolUse post-validate Suppressed errors, leaked secrets, writes outside repo
PostToolUse slop-detect AI code patterns โ€” doc/code imbalance, restating comments
PostToolUse lang-review Language-aware checks: Go, TypeScript, Rust, Python, Shell
SubagentStop subagent-stop Verifies subagent work before accepting
Stop stop-gate Merge conflicts, cross-domain test gaps, linter pass
Stop devkit-stop-guard Blocks session end during active workflows

Agents

Agent Model Used by
reviewer Opus tri-review workflow, feature workflow
researcher Sonnet research, deep-research, tri-debug workflows
improver Opus self-improve, self-lint, self-perf, refactor workflows
test-writer Sonnet self-test, tri-test-gen workflows
documenter Haiku doc-gen skill
security-auditor Opus tri-security, pr-ready, audit workflows

All agents run in worktree isolation.


Coding Rules

Language-specific rules that auto-activate when Claude reads matching files. Installed to ~/.claude/rules/ โ€” rules guide how to write, hooks catch what you missed.

/devkit:setup-rules
Language Examples
Go Error wrapping, context.Context, defer traps, JSON float64 gotcha
TypeScript unknown not any, discriminated unions, catch narrowing
Python Exception chains, type hints, dataclasses, pathlib
Rust Ownership, ? propagation, newtypes, clippy-as-errors
Shell set -euo pipefail, quoting, macOS portability

Architecture

MCP Server (bin/devkit mcp โ€” auto-started by plugin)
  โ”œโ”€โ”€ bin/devkit = POSIX shell wrapper (committed to git)
  โ”‚   โ””โ”€โ”€ On first run, downloads matching release asset from GitHub,
  โ”‚       verifies SHA256, caches as bin/devkit-engine-v<ver>-<os>-<arch>,
  โ”‚       then execs it. Local dev builds (make install-plugin) are used
  โ”‚       directly via the fast path.
  โ”œโ”€โ”€ Tools: devkit_start, devkit_advance, devkit_status, devkit_list
  โ”œโ”€โ”€ State: session.json (hot, <50ms reads) + SQLite (cold history)
  โ”œโ”€โ”€ Parse YAML โ†’ validate steps, branches, budget
  โ”œโ”€โ”€ Walk steps:
  โ”‚   โ”œโ”€โ”€ Command steps โ†’ engine executes shell directly ($0 cost)
  โ”‚   โ”‚   Values passed via $DEVKIT_INPUT / $DEVKIT_OUT_<step_id>
  โ”‚   โ”‚   env vars โ€” never interpolated into the command string.
  โ”‚   โ”œโ”€โ”€ Prompt steps โ†’ Claude works, calls devkit_advance when done
  โ”‚   โ”œโ”€โ”€ Loop with gate โ†’ run, verify, keep or revert
  โ”‚   โ”œโ”€โ”€ Branch โ†’ case-insensitive word-boundary match โ†’ goto
  โ”‚   โ””โ”€โ”€ Parallel โ†’ Agent tool dispatch (Claude/Codex/Gemini)
  โ””โ”€โ”€ Principles injected per step (~120 tokens, not full skill files)

Enforcement:
  โ”œโ”€โ”€ MCP tool scoping โ€” Claude can only call devkit_advance to progress
  โ”œโ”€โ”€ PreToolUse hook โ€” exit 2 blocks tools during command steps
  โ””โ”€โ”€ Stop hook โ€” blocks session end during active workflows

Terminal usage (devkit workflow <name> "<description>"):
  โ””โ”€โ”€ Subprocess runners for Codex/Gemini CLI usage

Repository Structure

devkit/
โ”œโ”€โ”€ commands/          # Legacy (references/ only); new entry points go in skills/
โ”œโ”€โ”€ skills/            # 38 skills (workflow triggers, principles, tools, utilities) + _principles.yml
โ”œโ”€โ”€ agents/            # 6 agents (reviewer, researcher, improver, ...)
โ”œโ”€โ”€ hooks/             # 12 hooks (safety, security, quality gates, workflow enforcement)
โ”œโ”€โ”€ workflows/         # 21 YAML workflow definitions
โ”œโ”€โ”€ resources/rules/   # Language-specific coding rules
โ”œโ”€โ”€ src/               # Go engine + MCP server
โ”‚   โ”œโ”€โ”€ mcp/           # MCP server (tools, principles loader, session management)
โ”‚   โ”œโ”€โ”€ engine/        # YAML workflow engine (parser, executor, tests)
โ”‚   โ”œโ”€โ”€ runners/       # Codex, Gemini interfaces (terminal fallback)
โ”‚   โ”œโ”€โ”€ lib/           # DB, git, metrics, session state, reporting
โ”‚   โ””โ”€โ”€ cmd/           # CLI entry points (including `devkit mcp`)
โ”œโ”€โ”€ bin/               # devkit wrapper (committed) + downloaded engine binaries (gitignored)
โ””โ”€โ”€ .github/workflows/ # CI (build+test+vet) + auto-release (6 platforms)

Release History

VersionChangesUrgencyDate
v2.1.36Merged: ci(release): bump mcpb/manifest.json in lockstep + rebuild bundleHigh5/22/2026
v2.1.32Merged: test(mcp): add stdout regression test for devkit mcpHigh5/19/2026
v2.1.29Merged: feat(cmd): add approve subcommand for workflow gatesHigh4/18/2026
v2.1.28Merged: Probe-local tri-review follow-ups: Status enum + shared probe + hint/test/contract fixesHigh4/17/2026
v2.1.27Merged: feat: local runner health probeHigh4/17/2026
v2.1.26Merged: fix(guard): cross-repo scope + companion-rescue Bash for tri-reviewHigh4/17/2026
v2.1.25Merged: fix(guard): allow Agent/Task dispatch on prompt+hard stepsHigh4/17/2026
v2.1.24Merged: fix: scope stop-guard to originating repo + force tri-* dispatchHigh4/15/2026
v2.1.23Merged: feat: local runner + stealth scraping backends (Camoufox, Scweet)High4/15/2026
v2.1.22Merged: feat: harness-audit workflow + expanded language rulesHigh4/13/2026
v2.1.21Merged: fix(skills): remove unquoted ": " from 10 SKILL.md descriptionsMedium4/11/2026
v2.1.20Merged: fix: populate agent bodies, sync mcpb version, update layout docsHigh4/11/2026
v2.1.19Merged: fix(readme): standardize bare slash commands, rename statusโ†’healthMedium4/11/2026
v2.1.18Merged: refactor(engine): enforce type design for EnforceMode (closes #81)High4/11/2026
v2.1.17Merged: docs: add CLAUDE.md as token-efficient navigation mapMedium4/11/2026
v2.1.16Merged: feat(engine): per-step enforce override + surgical soft-flips (#78)Medium4/11/2026
v2.1.15Merged: docs(skills): description-quality pass across 13 collision-prone skillsMedium4/11/2026
v2.1.14Merged: refactor(plugin): migrate commands/ to skills/Medium4/11/2026
v2.1.13Merged: fix(hooks): resolve #73 #74 #75 follow-ups (test harness, path handling, set -euo)Medium4/11/2026
v2.1.12Merged: fix(hooks): three silent-failure bugs found during hook auditMedium4/11/2026
v2.1.11Merged: feat(skills): deterministic dispatch for every devkit workflowMedium4/11/2026
v2.1.10Merged: fix(workflows): tri-review + tri-security enforce: softMedium4/11/2026
v2.1.9Merged: fix(hooks): native devkit-engine guard subcommand (closes #65)Medium4/11/2026
v2.1.8Merged: fix(hooks): enforce workflow progression on prompt steps + orphan recoveryMedium4/11/2026
v2.1.7Merged: fix(windows): real Go launcher for devkit MCPB bundle (closes #60)Medium4/10/2026
v2.1.6Merged: bin/devkit: fix Windows first-run install (curl+schannel bug, #58)Medium4/10/2026
v2.1.5Merged: hooks: auto-ignore .devkit/ in host repos on first runMedium4/10/2026
v2.1.4Merged: feat(pr-ready): add doc-check step to the workflowMedium4/10/2026
v2.1.3Merged: fix(wrapper): make engine download crash-safe and resumableMedium4/10/2026
v2.1.2Merged: fix: bootstrap engine binary on first run via committed wrapperMedium4/10/2026
v2.1.1Merged: feat: Playwright skills (screenshot, browser) + scrape backendMedium4/10/2026
v2.1.0Merged: MCP engine: deterministic workflow enforcement via tool scoping + hooksMedium4/10/2026
v2.0.39Merged: Remove docs/specs from public repoMedium4/10/2026
v2.0.38Merged: Pass version tag to binary build for correct devkit --versionMedium4/10/2026
v2.0.37Merged: Fix publish job: cd to /tmp broke gh release uploadMedium4/10/2026
v2.0.36Merged: Add changelog for deterministic workflow conversionMedium4/10/2026
v2.0.35Merged: Fix stale command references in README and status.mdMedium4/10/2026
v2.0.34Merged: PR 6: Add expect field to engine for command step assertionsMedium4/10/2026
v2.0.33Merged: PR 5: Trim commands โ€” delete 16, keep 8 entry pointsMedium4/9/2026
v2.0.32Merged: Add mega-pr skill for combined parallel PR reviewMedium4/9/2026
v2.0.31Merged: PR 4: Convert remaining 12 commands to thin wrappersMedium4/9/2026
v2.0.30Merged: PR 3: Convert bugfix and feature commands to thin wrappersMedium4/9/2026
v2.0.29Merged: PR 2: Convert self-improvement loops to deterministic command+gateMedium4/9/2026
v2.0.28Merged: PR 1: Convert research skills to deterministic YAML wrappersMedium4/9/2026
v2.0.27Merged: Fix marketplace install to use github shorthand for Update Now supportMedium4/9/2026
v2.0.26Merged: Add ERR trap logging and migrate to [[ ]] in stop-gate.shMedium4/9/2026
v2.0.25Merged: Fix stop-gate infinite loop on large TS projects + README rewriteMedium4/9/2026
v2.0.24Merged: Add deterministic command steps and loop gates to workflow engineMedium4/9/2026
v2.0.23Merged: Add domain probes, stub detection, and symptom triageMedium4/8/2026
v2.0.22Merged: Supplement commands with autoresearch-inspired patternsMedium4/7/2026
v2.0.21Merged: Fix tri-agent failures on large diffsMedium4/7/2026
v2.0.20Merged: Add setup-rules command and coding rule reference filesMedium4/6/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

opentabsBrowser automation clicks buttons. OpenTabs calls APIs.main@2026-06-06
ring89 skills and 38 specialized agents that enforce proven engineering practices for AI-assisted development. TDD, systematic debugging, parallel code review, and 10-gate development cycles โ€” as a Claudemain@2026-06-03
zotero-mcp-lite๐Ÿš€ Run a high-performance MCP server for Zotero, enabling customizable workflows without cloud dependency or API keys.main@2026-06-01
@claude-flow/cliRuflo CLI - Enterprise AI agent orchestration with 60+ specialized agents, swarm coordination, MCP server, self-learning hooks, and vector memory for Claude Codev3.10.31
mcp-tidy๐Ÿงน Simplify your MCP servers with mcp-tidy, clearing server bloat to enhance performance and improve tool selection in Claude Code.main@2026-05-31

More in MCP Servers

AstrBotAgentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. โœจ
agentscopeBuild and run agents you can see, understand and trust.
claude-plugins-officialOfficial, Anthropic-managed directory of high quality Claude Code Plugins.
langchain4jLangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes impleme