freshcrate
Skin:/
Home > Testing > autonomous-agentic-research-swarm

autonomous-agentic-research-swarm

File-based autonomous agentic research swarm template (Planner/Worker/Judge) with contracts, workstreams, and deterministic quality gates.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

Description

File-based autonomous agentic research swarm template (Planner/Worker/Judge) with contracts, workstreams, and deterministic quality gates.

README

Autonomous Agentic Research Swarm

This repository is a repo-native research operating system for the current L2-to-L1 rent analysis project. v1 is designed to carry one real project from definition lock to a reproducible working-paper release.

Locked empirical artifact DAG

registry -> raw snapshots/manifests -> processed datasets/manifests -> validation -> figures/tables -> Quarto paper -> release manifest

The current project is not releaseable until that full path exists.

Default execution paths

  • Local swarm (scripts/swarm.py + .orchestrator/) is the default engine for routine repo task execution, deterministic gates, and normal multi-agent delivery.
  • Reviewed staged-workflow-runner path is for high-stakes Operator-owned synthesis work such as architecture rewrites, major replans, and release assessments.
  • The paper substrate is Quarto-backed Markdown under reports/paper/.

Four-role operating model

  • Operator โ€” runtime preflight, worktree/tmux supervision, repair handling, sweeps, run/review/release logging, catalog refresh, and release assembly.
  • Planner โ€” task decomposition, dependency wiring, workstream ownership, and lifecycle projection.
  • Worker โ€” one assigned task, one isolated worktree, one explicit output contract.
  • Judge โ€” reruns gates, verifies outputs and provenance, and is the only role allowed to mark work done.

Current battle-test queue

  1. T025 โ€” populate registry/rollup_registry_v1.csv with evidence-backed in-scope rows.
  2. T030 โ€” pull growthepie snapshots, write raw manifests, normalize the vendor panel, and commit a tiny deterministic sample.
  3. T035 โ€” build the authoritative on-chain L1 rent path, write processed manifests, and materialize the canonical daily_rollup_panel.
  4. T040 โ€” lock STR math in src/analysis/metrics_str.py with sample-only tests.
  5. T050 โ€” validate the canonical panel, L1 rent decomposition, and cross-source reconciliation.
  6. T060 โ€” generate release figures and tables from validated artifacts only.
  7. T070 โ€” write Quarto manuscript source and confirm a draft render path.
  8. T080 โ€” Operator release assembly: compile reports/catalog.yaml, render final paper artifacts, and write the release manifest.

What counts as a release candidate

A release candidate must include all of the following:

  • an evidence-backed registry/rollup_registry_v1.csv
  • raw manifests for growthepie and the on-chain L1 rent pull
  • processed manifests for the vendor panel, L1 rent decomposition, and canonical rollup panel
  • validation JSON/Markdown outputs
  • release figures and tables
  • Quarto paper source plus rendered HTML/PDF and render_manifest.json
  • reports/catalog.yaml compiled from successful run manifests
  • reports/status/releases/release_<YYYY-MM-DD>.json

A sample figure alone is not battle-test success.

Repository map

  • .orchestrator/ โ€” file-based control plane, task queue, templates, and handoffs
  • contracts/ โ€” project instance contract, framework policy, empirical definitions, and hybrid/modeling interfaces
  • docs/ โ€” protocol lock, runbooks, and role prompts
  • registry/ โ€” versioned rollup universe contract
  • data/raw_manifest/ and data/processed_manifest/ โ€” tracked provenance for raw and processed artifacts
  • src/etl/, src/validation/, src/analysis/, src/model/ โ€” code split by responsibility
  • reports/ โ€” validation outputs, figures, tables, Quarto paper source/build, catalog, and release manifests
  • tests/ โ€” fast offline tests on tracked samples

Quickstart

  1. Read AGENTS.md.
  2. Review docs/protocol.md, contracts/project.yaml, and contracts/framework.json.
  3. Inspect .orchestrator/workstreams.md and the live backlog under .orchestrator/backlog/.
  4. Run make gate and make test on the base branch.
  5. Use docs/runbook_swarm.md for manual execution or docs/runbook_swarm_automation.md for the default local swarm path.
  6. When upstream tasks are done, run the Operator release path with python scripts/release_assembly.py --release-date YYYY-MM-DD --check.

Mode coverage

  • Empirical is the active mode for this repo and the only mode that may be claimed as end-to-end ready after the battle-test release succeeds.
  • Modeling remains contract-ready through contracts/model_spec.md, contracts/instances/, and contracts/experiments/, but it is not yet battle-tested here.
  • Hybrid remains contract-ready through contracts/hybrid_interface_v1.yaml; modeling tasks may consume only explicit instance manifests, not ad hoc empirical CSV paths.

Release History

VersionChangesUrgencyDate
main@2026-04-11Latest activity on main branchHigh4/11/2026
v0.1.0Latest release: v0.1.0High4/11/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

octobenchBenchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.main@2026-06-02
automagik-genieSelf-evolving AI agent orchestration framework with Model Context Protocol supportv4.260606.2
aicageRuns agentic coding assistants in Docker containersmain@2026-06-05
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality vv0.12.0
agent-reviewAnalyze git code changes to generate structured review reports using flexible AI models and integrated workflows.main@2026-06-04

More in Testing

fspecFSPEC: The Spec-Driven, Multi-Agent Coding Factory. It is infrastructure for the "Dark Factory"โ€”the emerging model of fully autonomous software development where AI agents handle all implementation wh
vector-db-benchmarkFramework for benchmarking vector search engines
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v