freshcrate
Home > MCP Servers > devkit

devkit

A deterministic development harness for Claude Code โ€” MCP workflow engine, enforcement hooks, YAML workflows, and multi-agent consensus (Claude + Codex + Gemini)

Description

A deterministic development harness for Claude Code โ€” MCP workflow engine, enforcement hooks, YAML workflows, and multi-agent consensus (Claude + Codex + Gemini)

README

Devkit

A deterministic development harness for AI agents. The MCP engine controls workflow execution (step ordering, gates, loops, branches). The agent handles creativity. Every step is enforced, measured, and auditable.

Works with just Claude. Optionally adds Codex and Gemini for multi-agent consensus.


Install

1. Devkit (required)

/plugin marketplace add 5uck1ess/marketplace
/plugin install devkit@5uck1ess-plugins

Auto-updates are enabled by default. Devkit updates itself when you restart Claude Code.

2. Multi-agent plugins (optional)

These enable tri:* commands (tri-review, tri-debug, tri-security, etc.) to run Claude + Codex + Gemini in parallel.

# Codex plugin
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex

# Gemini plugin
/plugin marketplace add abiswas97/gemini-plugin-cc
/plugin install gemini@abiswas97-gemini

If plugins aren't installed, the CLI fallbacks work too:

brew install codex gemini-cli

3. Companion plugins (optional)

These handle concerns devkit doesn't โ€” methodology, specialized reviews, and context management. No overlap.

# Methodology โ€” brainstorming, planning, TDD, verification, debugging
/plugin install superpowers@claude-plugins-official

# Specialized review agents โ€” comment accuracy, type design, silent failures
/plugin install pr-review-toolkit@claude-plugins-official

# Deep feature exploration โ€” parallel codebase analysis, architecture proposals
/plugin install feature-dev@claude-plugins-official

# Quick commits โ€” /commit, /commit-push-pr, /clean_gone
/plugin install commit-commands@claude-plugins-official

# Hook creation โ€” markdown rules, hot reload, conversation analysis
/plugin install hookify@claude-plugins-official

# Skill development โ€” eval/benchmark framework, blind A/B testing
/plugin install skill-creator@claude-plugins-official

# Context window management โ€” sandboxes large outputs, 98% token savings
/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode

4. Optional tools

brew install rtk       # Token optimization (60-90% savings on Bash output)
brew install ast-grep  # AST-based repo mapping (used by onboard skill)

# Browser automation โ€” enables scrape (JS-rendered), screenshot, and browser skills
npx playwright install chromium

Playwright (optional) enables three skills: enhanced scrape for JS-heavy sites, screenshot for page captures, and browser for full automation (clicking, form filling, multi-step flows, codegen). Free and local โ€” no API keys. Install only the browsers you need (chromium is ~170MB).

Verify

/devkit:status

This shows which CLIs are installed, which agents are available, and which commands are ready.


Quick Start

# These activate automatically โ€” just ask naturally:
# "write tests for src/parser.ts"
# "generate a changelog"
# "help me understand this codebase"
# "research the best auth library for Node"

# Slash commands for complex workflows:
/tri:review                   # Multi-agent code review
# Or just describe: "submit a PR", "ship this" โ†’ pr-ready skill auto-activates

How It Works

Devkit runs as an MCP server inside Claude Code. When a workflow starts, the engine takes control:

devkit_start("research", "best Go testing frameworks")
  โ†’ Engine creates session, returns Step 1 + condensed principles
  โ†’ Claude executes the step using standard tools
  โ†’ Claude calls devkit_advance(session_id)
  โ†’ Engine validates, records output, returns Step 2
  โ†’ ...repeat until WORKFLOW COMPLETE

Enforcement (runs automatically):
  PreToolUse hook โ†’ blocks out-of-step actions during command steps
  Stop hook โ†’ prevents session end during active workflows

Why MCP? Claude can't skip steps because the engine controls what comes next. Claude can't call tools that aren't valid for the current step. The engine holds state โ€” Claude doesn't self-report.


Commands

All skills are tab-completable slash commands in current Claude Code. The primary user-facing entry points:

Command What it does
/tri-review Code review from 1-3 agents, consolidated report
/tri-debug Independent root-cause analysis from each agent
/tri-security Security audit with severity-ranked consensus
/devkit:status Health check
/devkit:setup-rules Install language-specific coding rules to ~/.claude/rules/ (user-only โ€” disable-model-invocation prevents auto-trigger)

Every workflow also has a dedicated slash command: /feature, /bugfix, /audit, /refactor, /pr-ready, /self-*, etc. Tasks like "ship this PR" or "submit a PR" also auto-activate the pr-ready skill via natural language.

Workflows

All 21 YAML workflows are invoked via the MCP engine. Every workflow has a trigger skill so natural-language keywords dispatch deterministically โ€” saying "build a feature", "fix this bug", "tri review", or "deep research X" fires the matching skill, which calls devkit_start and the engine takes over.

Workflow What it does
feature Brainstorm, plan, implement, test, lint, review
bugfix Reproduce, diagnose, fix, regression test, verify
refactor Analyze smells, plan, restructure, verify nothing broke
research Clarify, decompose, parallel search, corroborate, synthesize
deep-research ACH: hypotheses, disconfirmation, evidence matrix
self-test Run tests, fix failures, repeat until passing
self-lint Run linter, fix violations, repeat until clean
self-perf Benchmark, optimize, repeat until target met
self-improve Run metric, fix issues, repeat until passing
self-migrate Migrate code incrementally with test gate
self-audit Measure codebase, rank improvements by evidence
autoloop Autonomous audit/fix/measure/keep-or-revert loop
audit Dependencies, vulnerabilities, licenses, lint, security
pr-ready Full PR preparation pipeline
tri-review Multi-agent code review
tri-debug Multi-agent debugging
tri-security Multi-agent security audit
tri-dispatch Send any task to multiple agents
test-gen Generate tests via test-writer agent, iterate until passing
doc-gen Generate docs via documenter agent
onboard Generate codebase onboarding guide via researcher agent

Skills

Skills activate automatically based on context. No slash command needed. Every workflow has a matching trigger skill โ€” saying the keyword dispatches to the engine which then enforces every step.

Workflow trigger skills (dispatch to engine-enforced workflows):

Trigger Skill โ†’ Workflow
"build a feature", "new feature X" feature
"fix this bug", "this is broken" bugfix
"refactor this", "clean up X" refactor
"audit this project", "project health" audit
"research X" research
"deep research", "validate this" deep-research
"make a PR", "ship this", "create a pull request" pr-ready
"tri review", "triple review" tri-review
"tri debug", "triple debug" tri-debug
"tri security", "triple security audit" tri-security
"tri dispatch", "send to three models" tri-dispatch
"self-audit", "audit the codebase" self-audit
"self-improve", "keep fixing until X passes" self-improve
"self-lint", "fix all lint" self-lint
"self-migrate", "migrate incrementally" self-migrate
"self-perf", "optimize performance" self-perf
"self-test", "fix failing tests" self-test
"autoloop", "run experiments overnight" autoloop
"write tests for X" test-gen
"document this module" doc-gen
"onboard to this codebase" onboard

Other skills (tools, meta-orchestration, content):

Trigger Skill
"generate a changelog" changelog
"create an ADR" adr
"mega PR review" mega-pr (dispatches tri-review + pr-review-toolkit in parallel)
"scrape this URL" scrape
"screenshot this page" screenshot (requires Playwright)
"automate this browser flow" browser (requires Playwright)
Google Workspace CLI commands gcli

Coding principles (clean-code, dry, yagni, dont-reinvent, executing, stuck, scratchpad) are injected as condensed rules (~120 tokens) per workflow step โ€” not loaded as full skill files.


Hooks

12 hooks across 4 lifecycle events. All installed automatically with the plugin.

Event Hook What it catches
PreToolUse safety-check rm -rf /, DROP TABLE, force push, editing secrets
PreToolUse security-patterns eval(), XSS, shell injection, weak hashes, hardcoded secrets
PreToolUse audit-trail Logs every command to .devkit/audit.log
PreToolUse pr-gate Prompts to run the pr-ready skill before gh pr create
PreToolUse rtk-rewrite Compresses Bash output via RTK (no-op if not installed)
PreToolUse devkit-guard Blocks out-of-step tools during workflow command AND prompt steps (hard enforce); soft enforce emits a reminder. Skills are intentionally unguarded.
PostToolUse post-validate Suppressed errors, leaked secrets, writes outside repo
PostToolUse slop-detect AI code patterns โ€” doc/code imbalance, restating comments
PostToolUse lang-review Language-aware checks: Go, TypeScript, Rust, Python, Shell
SubagentStop subagent-stop Verifies subagent work before accepting
Stop stop-gate Merge conflicts, cross-domain test gaps, linter pass
Stop devkit-stop-guard Blocks session end during active workflows

Agents

Agent Model Used by
reviewer Opus tri-review workflow, feature workflow
researcher Sonnet research, deep-research, tri-debug workflows
improver Opus self-improve, self-lint, self-perf, refactor workflows
test-writer Sonnet self-test, tri-test-gen workflows
documenter Haiku doc-gen skill
security-auditor Opus tri-security, pr-ready, audit workflows

All agents run in worktree isolation.


Coding Rules

Language-specific rules that auto-activate when Claude reads matching files. Installed to ~/.claude/rules/ โ€” rules guide how to write, hooks catch what you missed.

/devkit:setup-rules
Language Examples
Go Error wrapping, context.Context, defer traps, JSON float64 gotcha
TypeScript unknown not any, discriminated unions, catch narrowing
Python Exception chains, type hints, dataclasses, pathlib
Rust Ownership, ? propagation, newtypes, clippy-as-errors
Shell set -euo pipefail, quoting, macOS portability

Architecture

MCP Server (bin/devkit mcp โ€” auto-started by plugin)
  โ”œโ”€โ”€ bin/devkit = POSIX shell wrapper (committed to git)
  โ”‚   โ””โ”€โ”€ On first run, downloads matching release asset from GitHub,
  โ”‚       verifies SHA256, caches as bin/devkit-engine-v<ver>-<os>-<arch>,
  โ”‚       then execs it. Local dev builds (make install-plugin) are used
  โ”‚       directly via the fast path.
  โ”œโ”€โ”€ Tools: devkit_start, devkit_advance, devkit_status, devkit_list
  โ”œโ”€โ”€ State: session.json (hot, <50ms reads) + SQLite (cold history)
  โ”œโ”€โ”€ Parse YAML โ†’ validate steps, branches, budget
  โ”œโ”€โ”€ Walk steps:
  โ”‚   โ”œโ”€โ”€ Command steps โ†’ engine executes shell directly ($0 cost)
  โ”‚   โ”‚   Values passed via $DEVKIT_INPUT / $DEVKIT_OUT_<step_id>
  โ”‚   โ”‚   env vars โ€” never interpolated into the command string.
  โ”‚   โ”œโ”€โ”€ Prompt steps โ†’ Claude works, calls devkit_advance when done
  โ”‚   โ”œโ”€โ”€ Loop with gate โ†’ run, verify, keep or revert
  โ”‚   โ”œโ”€โ”€ Branch โ†’ case-insensitive word-boundary match โ†’ goto
  โ”‚   โ””โ”€โ”€ Parallel โ†’ Agent tool dispatch (Claude/Codex/Gemini)
  โ””โ”€โ”€ Principles injected per step (~120 tokens, not full skill files)

Enforcement:
  โ”œโ”€โ”€ MCP tool scoping โ€” Claude can only call devkit_advance to progress
  โ”œโ”€โ”€ PreToolUse hook โ€” exit 2 blocks tools during command steps
  โ””โ”€โ”€ Stop hook โ€” blocks session end during active workflows

Terminal usage (devkit workflow <name> "<description>"):
  โ””โ”€โ”€ Subprocess runners for Codex/Gemini CLI usage

Repository Structure

devkit/
โ”œโ”€โ”€ commands/          # Legacy (references/ only); new entry points go in skills/
โ”œโ”€โ”€ skills/            # 38 skills (workflow triggers, principles, tools, utilities) + _principles.yml
โ”œโ”€โ”€ agents/            # 6 agents (reviewer, researcher, improver, ...)
โ”œโ”€โ”€ hooks/             # 12 hooks (safety, security, quality gates, workflow enforcement)
โ”œโ”€โ”€ workflows/         # 21 YAML workflow definitions
โ”œโ”€โ”€ resources/rules/   # Language-specific coding rules
โ”œโ”€โ”€ src/               # Go engine + MCP server
โ”‚   โ”œโ”€โ”€ mcp/           # MCP server (tools, principles loader, session management)
โ”‚   โ”œโ”€โ”€ engine/        # YAML workflow engine (parser, executor, tests)
โ”‚   โ”œโ”€โ”€ runners/       # Codex, Gemini interfaces (terminal fallback)
โ”‚   โ”œโ”€โ”€ lib/           # DB, git, metrics, session state, reporting
โ”‚   โ””โ”€โ”€ cmd/           # CLI entry points (including `devkit mcp`)
โ”œโ”€โ”€ bin/               # devkit wrapper (committed) + downloaded engine binaries (gitignored)
โ””โ”€โ”€ .github/workflows/ # CI (build+test+vet) + auto-release (6 platforms)

Release History

VersionChangesUrgencyDate
v2.1.29Merged: feat(cmd): add approve subcommand for workflow gatesHigh4/18/2026
v2.1.24Merged: fix: scope stop-guard to originating repo + force tri-* dispatchHigh4/15/2026
v2.1.22Merged: feat: harness-audit workflow + expanded language rulesHigh4/13/2026
v2.1.20Merged: fix: populate agent bodies, sync mcpb version, update layout docsHigh4/11/2026
v2.1.18Merged: refactor(engine): enforce type design for EnforceMode (closes #81)High4/11/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

zotero-mcp-lite๐Ÿš€ Run a high-performance MCP server for Zotero, enabling customizable workflows without cloud dependency or API keys.main@2026-04-21
mcp-tidy๐Ÿงน Simplify your MCP servers with mcp-tidy, clearing server bloat to enhance performance and improve tool selection in Claude Code.main@2026-04-21
sqltools_mcp๐Ÿ”Œ Access multiple databases seamlessly with SQLTools MCP, a versatile service supporting MySQL, PostgreSQL, SQL Server, DM8, and SQLite without multiple servers.main@2026-04-21
opentabsBrowser automation clicks buttons. OpenTabs calls APIs.main@2026-04-20
ralphglassesMulti-LLM agent orchestration TUI โ€” parallel Claude/Gemini/Codex sessions, 126 MCP toolsv0.2.0