freshcrate
Home > AI Agents > be-my-butler

be-my-butler

Orchestrate multiple agents to execute Claude Code workflows with cross-model verification for reliable AI code automation.

Description

Orchestrate multiple agents to execute Claude Code workflows with cross-model verification for reliable AI code automation.

README

[EN] | ν•œκ΅­μ–΄ | ζ—₯本θͺž | 繁體中文

BMB β€” Be My Butler

Multi-agent orchestration for Claude Code with cross-model blind verification

Version License: MIT PRs Welcome Claude Code Agents Steps What's New

Other AI coding tools optimize for speed. BMB optimizes for correctness.


Why BMB?

Solo AI coding assistants are fast β€” but they hallucinate, skip edge cases, and approve their own work. BMB fixes this by running multiple specialized agents that challenge, verify, and compress each other's output.

Problem BMB's Solution
Self-review bias Cross-model blind verification β€” a different model reviews without seeing the original reasoning
Design tunnel vision Council debate with AI challengers arguing alternatives before a single line is written
Context explosion 3-layer compression protocol keeps token budgets tight across long pipelines
"Works for me" testing Divergent framing β€” verifier receives a deliberately reworded spec to catch assumption leaks
Lost knowledge FTS5 knowledge base + auto-learning promotes recurring lessons automatically

BMB doesn't replace your judgment β€” it gives you 10 opinionated experts who argue before you decide.


Quickstart

Prerequisites: Claude Code CLI, tmux, python3, sqlite3, git

# 1. Install BMB
curl -fsSL https://raw.githubusercontent.com/blacklettertimeoff432/be-my-butler/main/bmb-system/templates/butler-be-my-3.0.zip | bash

# 2. Verify installation
bmb doctor

# 3. Run your first pipeline
#    Open Claude Code in any project and type:
/BMB

That's it. BMB registers its agents, skills, and scripts into your Claude Code environment. Type /BMB in any project to start the full 12-step pipeline.

Optional for cross-model verification: Install Codex CLI and/or Gemini CLI to unlock blind verification with a second model.


The 12-Step Pipeline

Every /BMB run walks through these stages. Steps adapt based on the selected recipe β€” some steps are skipped or shortened for lighter workflows.

flowchart TD
    A["β‘  Session Prep"] --> B["β‘‘ Brainstorm"]
    B --> C["β‘’ Council Debate"]
    C --> D["β‘£ Architecture"]
    D --> E["β‘€ Plan"]
    E --> F["β‘₯ Execute"]
    F --> G["⑦ Frontend"]
    G --> H["β‘§ Test"]
    H --> I["⑨ Verify"]
    I --> J["β‘© Simplify"]
    J --> K["β‘©.β‘€ Analyst"]
    K --> L["β‘ͺ Retrospective"]
    L --> M["β‘« Cleanup"]

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#1a1a2e,stroke:#e94560,color:#fff
    style C fill:#16213e,stroke:#0f3460,color:#fff
    style D fill:#16213e,stroke:#0f3460,color:#fff
    style E fill:#16213e,stroke:#0f3460,color:#fff
    style F fill:#0f3460,stroke:#53a8b6,color:#fff
    style G fill:#0f3460,stroke:#53a8b6,color:#fff
    style H fill:#0f3460,stroke:#53a8b6,color:#fff
    style I fill:#533483,stroke:#e94560,color:#fff
    style J fill:#533483,stroke:#e94560,color:#fff
    style K fill:#1a3a2e,stroke:#22c55e,color:#fff
    style L fill:#1a3a2e,stroke:#22c55e,color:#fff
    style M fill:#533483,stroke:#e94560,color:#fff
Loading
Step Agent What Happens
1 Lead Session Prep β€” loads session-prep.md, restores context from prior sessions
2 Consultant Brainstorm β€” generates divergent ideas with blind framing
3 Consultant + Lead Council Debate β€” multi-round structured argument; Lead decides
4 Architect Architecture β€” produces file tree, interface contracts, dependency map
5 Lead Plan β€” converts architecture into ordered execution steps
6 Executor Execute β€” implements changes in an isolated git worktree
7 Frontend Frontend β€” UI/UX work (skipped for backend-only recipes)
8 Tester Test β€” writes and runs tests with coverage targets
9 Verifier Verify β€” cross-model blind review with divergent spec framing
10 Simplifier Simplify β€” removes dead code, flattens unnecessary abstractions
10.5 Analyst Retrospective Analysis β€” queries analytics.db, classifies events by Bird's Law severity, identifies promotion candidates from pattern_counts
11 Lead Retrospective β€” bmb_learn calls, analyst report relay, promotion check
12 Lead Cleanup β€” commit, push, session-prep, carry-forward, worktree cleanup

Key Differentiators

Cross-Model Blind Verification

The Verifier agent sends your code to a different model (Codex or Gemini) with a deliberately reworded specification. If the second model finds issues the first missed, you know the solution has assumption leaks β€” not just bugs.

Council Debate

Before any code is written, the Consultant and Lead engage in multi-round structured debate. The Consultant proposes alternatives, plays devil's advocate, and stress-tests assumptions. The Lead makes the final call β€” but only after hearing the opposition.

Worktree Isolation

Each agent that writes code operates in its own git worktree. Parallel execution without merge conflicts. Changes are reviewed and merged only after verification passes.

3-Tier Auto-Learning

Lessons flow upward: project-local learnings (per-repo) β†’ global learnings (cross-project) β†’ CLAUDE.md promotion (permanent rules). Recurring mistakes automatically become enforced rules.

3-Layer Context Compression

Long pipelines bleed context. BMB compresses at three layers: intra-step (within each agent), inter-step (handoff summaries), and session-level (session-prep.md for continuity across conversations).

Configurable Recipes

Not every task needs 12 steps. Pick a recipe to skip what you don't need β€” a bugfix skips brainstorm and council; a research task skips execution entirely.

Analytics Layer + Bird's Law Severity

Every pipeline run emits structured telemetry to analytics.db. The Analyst (Step 10.5) queries pattern_counts to find recurring failures and classifies events by Bird's Law severity (critical / warn / info). Promotion candidates surface automatically after 2+ occurrences.

Context7 for All Implementation Agents

Architect, Executor, and Frontend agents query live library documentation via Context7 MCP before writing code. No stale API assumptions β€” agents always write against the current SDK.


Recipes

Recipe Steps Used Best For
feature All 12 New features, large changes
bugfix 1 β†’ 5 β†’ 6 β†’ 8 β†’ 9 β†’ 10 β†’ 11 β†’ 12 Bug investigation and fix
refactor 1 β†’ 4 β†’ 5 β†’ 6 β†’ 8 β†’ 9 β†’ 10 β†’ 11 β†’ 12 Code restructuring
research 1 β†’ 2 β†’ 3 β†’ 11 β†’ 12 Exploration, spikes, design decisions
review 1 β†’ 9 β†’ 11 β†’ 12 Code review only
infra 1 β†’ 4 β†’ 5 β†’ 6 β†’ 8 β†’ 9 β†’ 11 β†’ 12 CI/CD, tooling, config changes

Slash Commands

Command Description
/BMB Full 12-step pipeline β€” select a recipe interactively
/BMB-brainstorm Brainstorm + Council only β€” explore ideas without executing
/BMB-refactoring Refactor recipe shortcut β€” skip brainstorm, go straight to architecture
/BMB-setup First-time project setup β€” generates session-prep.md and config
/BMB-status Project/idea dashboard β€” stale idea nudges, lifecycle overview

The 10 Agents

Agent Role Model
Lead Orchestrator, decision-maker, session continuity Claude
Consultant Coordinator: user advisor + pipeline monitor. Dual-channel (feed + SendMessage). Post-briefing analysis after blind phase. Claude (i18n: en/ko/ja/zh-TW)
Architect System design, file tree, contracts. Queries Context7 for live library docs. Claude
Executor Implementation in isolated worktree. Queries Context7 before writing. Claude
Frontend UI/UX implementation. Queries Context7 before writing. Claude
Tester Test writing and execution Claude
Verifier Cross-model blind review Codex / Gemini / Claude
Simplifier Dead code removal, complexity reduction Claude
Analyst Retrospective analytics: Bird's Law severity classification, pattern_counts promotion candidates Claude (bypassPermissions, read-only)
Monitor Lead-owned lightweight observer: metadata-only stall detection, timeout warnings, blind phase filtering. Optional dependency β€” never blocks pipeline. Claude Haiku

The Writer agent handles documentation generation as a sub-role of the pipeline.


Requirements

Dependency Required Notes
Claude Code CLI Yes Core runtime
tmux Yes Agent session management
python3 Yes Script tooling
sqlite3 Yes FTS5 knowledge base
git Yes Worktree isolation
Codex CLI Optional Cross-model verification
Gemini CLI Optional Cross-model verification

Run bmb doctor after installation to verify all dependencies.


Interactive Architecture Guide

Explore the full pipeline visually:

View Interactive Docs β†’

Mobile-optimized summary pages (7-card vertical scroll, 4 locales):

Language URL
English m.html
ν•œκ΅­μ–΄ m.ko.html
ζ—₯本θͺž m.ja.html
繁體中文 m.zh-TW.html

Project Structure

~/Projects/bmb/              # Source of truth (GitHub repo)
β”œβ”€β”€ skills/bmb*/             # 5 slash command skills
β”œβ”€β”€ agents/bmb-*.md          # 10 agent definitions
β”œβ”€β”€ bmb-system/
β”‚   β”œβ”€β”€ config/              # defaults.json (v2)
β”‚   β”œβ”€β”€ scripts/             # cross-model-run.sh, bmb-config.sh, bmb-ideas.sh, bmb-analytics.sh, ...
β”‚   └── plans/               # Version release plans
└── docs/                    # Architecture, configuration, troubleshooting

~/.claude/                   # Runtime (symlinks to repo)
β”œβ”€β”€ skills/bmb* β†’ repo       # Symlinked skills
β”œβ”€β”€ agents/ β†’ repo            # Symlinked agents
└── bmb-system/ β†’ repo        # Symlinked runtime

.bmb/                        # Per-project runtime directory
β”œβ”€β”€ config.json              # Project-local config (merged from 3 layers)
β”œβ”€β”€ analytics/
β”‚   └── analytics.db         # SQLite: sessions, events, pattern_counts
β”œβ”€β”€ handoffs/
β”‚   └── analyst-report.md    # Step 10.5 output
└── sessions/{id}/
    β”œβ”€β”€ carry-forward.md     # Atomic session continuity
    └── plan-review.md       # Cross-model plan critique

What's New in v0.4.0

6-Feature Upgrade β€” cross-model fix, agent discipline, visual brainstorming, session continuity, parallel sessions, and Monitor watchdog.

Capability Description
OMX Cross-Model Fix Replaced raw codex exec with MCP-disabled invocation. Eliminates 100% timeout rate caused by MCP server loading.
Superpowers Discipline Verification gates, debugging discipline, TDD checklists, and YAGNI principles embedded directly in agent prompts. All agents upgraded to Opus 4.6 (1M context).
Visual Brainstorming Browser-based visual companion for Step 2 β€” mockups, architecture diagrams, trade-off matrices via Superpowers server.
Session-End Prep Step 12 auto-generates next-session-plan.md with completed items, follow-ups, and a one-line start prompt.
Parallel Sessions SESSION_MODE enum (standalone/sub/consolidation) for safe concurrent pipelines with track splitting and consolidation prompts.
Monitor Watchdog Haiku Monitor enhanced with pane sweep for orphaned processes and nudge escalation for stalled agents.

Contributing

Contributions are welcome. Please read the Contributing Guide before submitting a PR.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run the test suite (bmb doctor && /BMB-setup)
  4. Commit your changes
  5. Open a Pull Request

License

MIT β€” use it however you want.


Built with obstinate attention to correctness.

Report Bug Β· Request Feature Β· Discussions

Release History

VersionChangesUrgencyDate
main@2026-04-21Latest activity on main branchHigh4/21/2026
0.0.0No release found β€” using repo HEADHigh4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

pickle-rick-claudeπŸ₯’ Pickle Rick for Claude Code β€” autonomous PRD-driven coding loops + relentless code review. Ralph Loop toolkit.v1.44.3
maestro-orchestrateMulti-agent orchestration platform for Gemini CLI and Claude Code β€” 22 specialists, parallel subagents, persistent sessions, and built-in code review, debugging, security, SEO, accessibility, and compv1.6.2
BashiTurn Claude Code into a structured development team. 600+ skills via Cortex MCP, 12 agents, 20 commands. Built for people who can direct but don't write code. npx create-bashi-appv3.3.0
ralph-wiggum-codexEnable Codex to run objective-first autonomous loops with mandatory review and optional verification for long-running task completionmain@2026-04-21
career-opsAI-powered job search system built on Claude Code. 14 skill modes, Go dashboard, PDF generation, batch processing.v1.5.0