Orchestra — how to orchestrate agents without making a mess
Opinionated operator guidance for multi-agent systems: what scales, what fails, and where to keep humans in the loop.
Add a real review gate before side effectsP0
- Separate proposer and reviewer roles.
- Require spec compliance plus code-quality review.
- Block deploy/write actions until review passes.
Require tool-grounded reads before actionP0
- Fetch current state before answering or mutating.
- Prefer logs, DB rows, and file reads over prompt memory.
- Save structured observations for downstream agents.
Introduce task contracts between agentsP0
- Each delegated task includes objective, allowed tools, success criteria, and max budget.
- Outputs are structured: result, evidence, unresolved risks.
- No worker gets implicit permission to mutate unrelated surfaces.
Codify human escalation boundariesP1
- List destructive and regulated actions that require human approval.
- Bundle evidence with each escalation.
- Keep a visible queue of blocked tasks.
Centralize the artifact spineP1
- One plan or issue doc per workstream.
- Attach tests/logs/receipts to the work item.
- Record reversals and operator overrides.
Human escalation thresholdsproductionDefine exactly when the orchestra stops and asks a human: production writes, secrets, payments, legal, or ambiguous user intent.
Why it works: Strong orchestration is not full autonomy — it is clean escalation at the right boundary.
safetyopshuman-in-the-loop
Do this
- Codify escalation triggers instead of relying on agent intuition.
- Expose pending approvals in one queue.
- Capture the full evidence bundle that caused escalation.
Avoid this
- Human approval for every trivial step.
- No human review for destructive actions.
- Escalation with no context, logs, or diff attached.
Review-gated execution laneproductionSeparate generation from approval: one agent proposes changes, another checks spec/security, then the executor applies.
Why it works: It catches shallow reasoning, over-broad edits, and unsafe side effects before they hit prod.
reviewsafetydeployment
Do this
- Use at least one explicit review gate for schema changes, auth, billing, or deploys.
- Review against both product spec and code quality — not just tests passing.
- Keep reviewer prompts adversarial: ask what could break, leak, or drift.
Avoid this
- Same agent writes and rubber-stamps its own work.
- Review happening only after merge.
- Treating green CI as the only approval signal.
Shared artifact spineteamCoordinate through explicit artifacts — plans, issue specs, receipts, test outputs, and decision logs.
Why it works: Artifacts survive context windows and prevent hidden assumptions between agents.
memoryhandoffcoordination
Do this
- Use one canonical task doc per workstream.
- Store acceptance criteria next to the artifact, not only in chat.
- Log decisions and reversals so later agents know why a path changed.
Avoid this
- Coordination purely through chat memory.
- Multiple diverging TODO lists.
- Undocumented manual fixes by human operators.
Small-batch delegationprototypeStart with 2–3 concurrent agents on independent slices, then scale only after measuring merge pain and review load.
Why it works: Parallelism helps only when synthesis cost stays lower than the work you save.
delegationthroughputcost
Do this
- Split by file boundary or concern boundary, not by vague themes.
- Cap parallelism until you can measure collision rate.
- Always reserve one lane for validation and synthesis.
Avoid this
- Spawning ten agents into the same surface area.
- Parallel agents editing the same auth/config files.
- Assuming more agents always means more speed.
Supervisor → worker graphproductionUse one planner/supervisor to break work into bounded sub-tasks and route them to narrow workers.
Why it works: You keep strategy centralized while shrinking the context and permissions each worker needs.
delegationsupervisionrouting
Do this
- Make workers single-purpose: code, research, QA, or deployment — not everything at once.
- Pass explicit task contracts with success criteria, budget, and allowed tools.
- Require the supervisor to synthesize worker outputs before taking side-effecting actions.
Avoid this
- Letting every agent talk to every other agent freely.
- Giving all workers the full repo and full prompt history by default.
- No review gate before write or deploy actions.
Tool-first groundingteamMake agents inspect live state before deciding: files, logs, DB rows, process state, metrics.
Why it works: Most orchestration failures come from agents acting on stale assumptions instead of current system state.
observabilitygroundingtooling
Do this
- Require a live read before any irreversible action.
- Prefer deterministic tools over memory for versions, counts, and current configs.
- Persist structured outputs so downstream agents inherit facts instead of prose guesses.
Avoid this
- Agents answering from memory for current facts.
- Long prompt chains with no system-state refresh.
- Passing screenshots or summaries when raw logs are available.
