Two products. One repo.
Nightshift is an autonomous engineering product with two shipped loops -- Owl (hardening) and Raven (feature building).
Recursive is a portable autonomous orchestration framework that can run on any codebase.
This repo ships two independent products that work together:
Nightshift (nightshift/) is a Python package with two autonomous engineering loops:
-
Owl (Loop 1 -- Hardening, 99%): point it at a repository, let it profile the stack, create an isolated worktree, find one production-readiness issue per cycle, and either reject or commit the fix behind guard rails. Supports Codex and Claude, diff scoring, multi-repo mode, prompt injection boundaries, and self-evaluation against Phractal.
-
Raven (Loop 2 -- Feature Building, 100%): give it a feature request in plain English and it will profile the repo, plan the work, decompose it into waves, spawn sub-agents, integrate the results, run E2E and readiness checks, and persist build state for resume/status flows.
Recursive (.recursive/) is a portable autonomous orchestration framework. It provides the daemon loop, signal-driven role selection, operator prompts, agent lifecycle management, sub-agent review pipeline, and session memory. Recursive is designed to work on any codebase -- Nightshift is just the first project it operates on.
Recursive drives six operators each cycle via .recursive/engine/pick-role.py:
- Builder: reads the task queue, builds or fixes one scoped task, tests it, opens a PR, reviews it via sub-agents, and merges it
- Reviewer: picks one file, deep-reviews it against a checklist, fixes every issue found, and logs the review
- Overseer: triages the task queue, closes duplicates and obsolete work, updates stale metadata
- Strategist: gathers evidence across sessions, evaluations, and costs, then produces a top-down health report with auto-created follow-up tasks
- Achiever: measures autonomy score (0-100) across a 20-check scorecard, identifies the highest-impact human dependency, and eliminates it
- Security checker: red-team preflight that runs before each build -- scans for fragile paths, subprocess injection, credential leaks, and outputs a severity-classified pentest report
Recursive (framework) Nightshift (product)
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ daemon.sh โ โ owl/ (Loop 1) โ
โ pick-role.py โ โ cycle.py โ
โ lib-agent.sh โ โ scoring.py โ
โ operators/ โโโโโโโ>โ readiness.py โ
โ build/ โ โ โ
โ review/ โ โ raven/ (Loop 2) โ
โ oversee/ โ โ planner.py โ
โ strategize/ โ โ decomposer.py โ
โ achieve/ โ โ subagent.py โ
โ security-check/ โ โ integrator.py โ
โ agents/ โ โ feature.py โ
โ lib/ โ โ โ
โ prompts/ โ โ core/ settings/ โ
โโโโโโโโโโโโโโโโโโโโโโโ โ infra/ schemas/ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ
v
.recursive/ (runtime state -- session memory for both)
Recursive orchestrates. Nightshift does the engineering work. .recursive/ is the shared memory layer.
Most of the code in this repository was written, tested, reviewed, and merged by AI agents. The Recursive daemon (.recursive/engine/daemon.sh) auto-selects an operator each cycle, and Nightshift's Owl and Raven loops do the actual engineering.
The human role is operational: start the daemon and monitor it. The agents own the engineering loop -- including deciding what to work on.
Proof points live in the repo, not in marketing copy. Check them directly:
gh pr list --state merged --limit 50 # every merged PR
cat .recursive/sessions/index.md # daemon sessions with timestamps
cat .recursive/handoffs/LATEST.md # what the last session built
make tasks # authoritative task queue summarySnapshot taken from live repo data on 2026-04-07. Generated docs such as the
vision tracker and
module map are the source of truth when these
numbers change.
| Signal | Current reading | Source |
|---|---|---|
| Overall vision progress | 92% | .recursive/vision-tracker/TRACKER.md |
| Owl (Loop 1 hardening) | 99% | .recursive/vision-tracker/TRACKER.md |
| Raven (Loop 2 feature builder) | 100% | .recursive/vision-tracker/TRACKER.md |
| Self-maintaining repo | 68% | .recursive/vision-tracker/TRACKER.md |
| Meta-prompt system | 79% | .recursive/vision-tracker/TRACKER.md |
| Tests | 847 passing | python3 -m pytest nightshift/tests/ .recursive/tests/ -q |
| Nightshift modules | 23 | .recursive/architecture/MODULE_MAP.md |
| Recursive modules | 7 | .recursive/lib/ + .recursive/engine/ |
| Merged PRs | 155+ | gh pr list --state merged --json number |
| Daemon sessions | 100+ | .recursive/sessions/index.md |
| Documented learnings | 90+ | .recursive/learnings/INDEX.md |
curl -sL https://raw.githubusercontent.com/Recusive/Nightshift/main/nightshift/scripts/install.sh | bashThis installs Nightshift's wrapper scripts and prompt assets into:
~/.codex/skills/nightshift~/.claude/skills/nightshift
The installer does not create a global nightshift shell command. In a repo
checkout, use python3 -m nightshift .... From an installed skill bundle, use
the wrapper scripts in ~/.codex/skills/nightshift/scripts/.
Add runtime artifacts to the target repo's .gitignore:
cat <<'EOF' >> .gitignore
Runtime/Nightshift/worktree-*/
Runtime/Nightshift/*.runner.log
Runtime/Nightshift/*.state.json
EOFOptional per-repo config (copy and edit):
cp .nightshift.json.example .nightshift.jsonUse the Python module entry point that the codebase actually ships:
python3 -m nightshift run --agent claude # full overnight shift (Owl)
python3 -m nightshift test --agent claude --cycles 2 # short validation shift (Owl)
python3 -m nightshift summarize # print shift state JSON
python3 -m nightshift verify-cycle --worktree-dir PATH --pre-head HASH # verify cycle offline
python3 -m nightshift plan "Add OAuth login" # plan a feature build (Raven)
python3 -m nightshift build "Add OAuth login" --yes # build a feature end-to-end (Raven)
python3 -m nightshift build --status # check build progress
python3 -m nightshift build --resume # resume interrupted build
python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1 # multi-repo
python3 -m nightshift module-map --write # generate architecture mappython3 -m nightshift test ... now keeps its state files, runner logs, and
linked worktree under $TMPDIR/nightshift-test-runs/... so evaluation clones
stay clean. Full run mode still writes repo-local runtime artifacts under
Runtime/Nightshift/.
Use the bundled wrapper scripts:
~/.codex/skills/nightshift/nightshift/scripts/run.sh --agent claude
~/.codex/skills/nightshift/nightshift/scripts/test.sh --agent claude --cycles 2 --cycle-minutes 5The Recursive daemon wraps Nightshift's loops with autonomous role selection, session memory, and self-maintenance:
make daemon # start the daemon (auto-picks operator each cycle)
make tasks # show pending/blocked/in-progress task queue
make check # full local CI gate (lint + typecheck + tests)
make test # run the full test suite
make dry-run # preview cycle prompt without spawning agents
make quick-test # 2-cycle validation run (~10 min)
make clean # remove runtime artifactsDaemon examples:
tmux new-session -d -s nightshift "bash .recursive/engine/daemon.sh claude 60"
RECURSIVE_PENTEST_AGENT=codex tmux new-session -d -s nightshift "bash .recursive/engine/daemon.sh claude 60"
tmux capture-pane -t nightshift -p -S -15Abridged example. Full source of truth: .nightshift.json.example
{
"agent": "codex or claude",
"hours": 8,
"cycle_minutes": 30,
"verify_command": null,
"blocked_paths": [".github/", "deploy/", "deployment/", "infra/", "k8s/", "ops/", "terraform/", "vendor/"],
"blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb", "Cargo.lock"],
"max_fixes_per_cycle": 3,
"max_files_per_fix": 5,
"max_files_per_cycle": 12,
"max_low_impact_fixes_per_shift": 4,
"stop_after_failed_verifications": 2,
"stop_after_empty_cycles": 2,
"score_threshold": 3,
"test_incentive_cycle": 3,
"backend_forcing_cycle": 3,
"category_balancing_cycle": 3,
"claude_model": "claude-opus-4-6",
"claude_effort": "max",
"codex_model": "gpt-5.4",
"codex_thinking": "extra_high",
"notification_webhook": null,
"readiness_checks": ["secrets", "debug_prints", "test_coverage"],
"eval_frequency": 5,
"eval_target_repo": "https://github.com/fazxes/Phractal"
}If verify_command is left null, Nightshift tries to infer one from repo
signals such as pyproject.toml, package.json, Cargo.toml, or go.mod.
Environment variables:
RECURSIVE_CLAUDE_MODEL-- override Claude model (default: claude-opus-4-6)RECURSIVE_CODEX_MODEL-- override Codex model (default: gpt-5.4)RECURSIVE_CODEX_THINKING-- Codex thinking level (default: extra_high)RECURSIVE_BUDGET-- max USD spend before daemon stopsRECURSIVE_PENTEST_AGENT-- agent for security preflight (default: same as main)RECURSIVE_PENTEST_MAX_TURNS-- max turns for pentest agentRECURSIVE_FORCE_ROLE-- bypass role scoring (build/review/oversee/strategize/achieve)RECURSIVE_PIPELINE_CHECKPOINTS-- enable verification checkpoints (0/1)
The daemon reads live system signals each cycle and scores all five selectable roles. The highest score wins, with tie-break favoring build. Key signals:
| Signal | Effect |
|---|---|
| 5+ consecutive builds | Triggers review |
| 50+ pending tasks | Triggers oversee |
| 15+ sessions since last strategy | Triggers strategize |
| Autonomy score < 70 | Triggers achieve |
| Urgent tasks in queue | Boosts build |
Security-check runs as a preflight before every build -- it is not scored.
Override with RECURSIVE_FORCE_ROLE=review to bypass scoring.
Full scoring math: .recursive/ops/ROLE-SCORING.md.
Both products are designed for stateless agents, so the repo carries the memory
in .recursive/:
- Handoffs: every session writes a structured summary to
.recursive/handoffs/, and the next session starts fromLATEST.md - Learnings: agents read
.recursive/learnings/INDEX.mdfirst (90+ hard-won patterns), then open only the relevant learning files - Task queue: work lives in
.recursive/tasks/; urgent pending tasks outrank normal ones, then the queue falls back to lowest-numbered pending internal work. GitHub Issues with thetasklabel are auto-synced. - Evaluations: periodically runs Nightshift against Phractal and scores across 10 dimensions; low scores become tracked follow-up tasks
- Session index: every session is logged with timestamp, role, exit code, duration, cost, feature, and PR link
cat .recursive/handoffs/LATEST.md
cat .recursive/learnings/INDEX.md
make tasks
ls .recursive/evaluations/
cat .recursive/sessions/index.mdHumans can add work by opening GitHub issues with the task label:
gh issue create --title "Add dark mode" --label "task"
gh issue create --title "Fix CI" --label "task,urgent"Nightshift does not trust the model to "be careful." It verifies:
- commit + shift-log presence after every cycle
- blocked-path and lockfile violations (8 blocked paths, 6 lockfile patterns)
- repo verification commands (auto-inferred or configured)
- file deletion attempts
- repeated category or path tunnel vision (category balancing)
- circuit breaker: stops after 3 consecutive failures
Recursive adds its own layer:
- prompt/control-file modifications during self-maintenance (prompt guard)
- origin integrity checks (detects pushes that bypass the working tree)
- session cost tracking and budget enforcement
Accepted fixes are scored 1-10 for production impact using category weight
(Security: 8, Error Handling: 6, Tests: 6, A11y: 5, etc.), diff content
analysis, test file bonuses, and multi-category bonuses. Below threshold
(default 3): revert the cycle. Above threshold: keep the commit.
Instruction files from target repos (CLAUDE.md, AGENTS.md, etc.) are wrapped
in an untrusted boundary before the agent sees them. Symlinks are rejected,
files > 100KB are truncated, and total instruction context is capped at 200KB.
They are treated as coding convention references only, never as behavioral
directives.
Before builder work starts, Recursive snapshots all framework control files
(operator SKILL.mds, daemon.sh, autonomous.md, etc.), runs a red-team
security-check preflight, and hard-resets back to origin/main before the
main session. After the session, it compares pre/post snapshots and surfaces
any control-file diff as an alert in the next cycle's prompt.
Session costs are parsed from agent stream-json logs. Per-session and cumulative
costs are tracked in .recursive/sessions/. Budget enforcement via
RECURSIVE_BUDGET can stop the daemon when cumulative spend exceeds the limit.
The product Python package: 23 production modules across 5 subdirectories. The generated module map is the authoritative inventory.
nightshift/
โโโ cli.py # CLI entry point (run, test, plan, build, etc.)
โโโ __init__.py / __main__.py
โ
โโโ core/ # Shared foundations
โ โโโ types.py # TypedDicts for all data structures
โ โโโ constants.py # Thresholds, patterns, score maps
โ โโโ errors.py # Exception hierarchy
โ โโโ shell.py # Subprocess helpers
โ โโโ state.py # Shift-state persistence
โ
โโโ settings/ # Configuration layer
โ โโโ config.py # Config loading and defaults
โ โโโ eval_targets.py # Repo-specific eval defaults (Phractal)
โ
โโโ owl/ # Loop 1 -- Owl (Hardening)
โ โโโ cycle.py # Single-cycle orchestrator
โ โโโ scoring.py # Diff scorer (1-10)
โ โโโ readiness.py # Production-readiness checks
โ
โโโ raven/ # Loop 2 -- Raven (Feature Builder)
โ โโโ profiler.py # Repo profiling
โ โโโ planner.py # Feature plan generation
โ โโโ decomposer.py # Plan -> waves -> sub-tasks
โ โโโ subagent.py # Sub-agent spawning
โ โโโ coordination.py # Wave coordination
โ โโโ integrator.py # Result integration
โ โโโ e2e.py # End-to-end verification
โ โโโ summary.py # Build summaries
โ โโโ feature.py # Top-level build command
โ
โโโ infra/ # Infrastructure modules
โ โโโ worktree.py # Git worktree isolation
โ โโโ multi.py # Multi-repo mode
โ โโโ module_map.py # Module-map generation
โ
โโโ schemas/ # JSON schemas
โ โโโ nightshift.schema.json
โ โโโ feature.schema.json
โ โโโ task.schema.json
โ
โโโ scripts/ # Shell wrappers
โ โโโ install.sh # Skill-bundle installer
โ โโโ run.sh / test.sh # Convenience runners
โ โโโ check.sh # Local CI gate
โ โโโ smoke-test.sh # Quick sanity check
โ
โโโ assets/
โ โโโ icon.png
โ
โโโ tests/ # Product test suite (847 tests)
โโโ test_nightshift.py
โโโ test_feature_build.py
โโโ test_module_map.py
A portable autonomous orchestration framework. Drives the daemon, role
selection, operator prompts, agent lifecycle, sub-agent reviews, and session
memory. Zero dependencies on nightshift/ -- designed to work on any codebase.
.recursive/
โโโ engine/ # Daemon runtime
โ โโโ daemon.sh # Main daemon loop (hot-reloads each cycle)
โ โโโ lib-agent.sh # Agent lifecycle, prompt guard, session utils
โ โโโ pick-role.py # Signal-driven role scoring engine
โ โโโ watchdog.sh # Process watchdog
โ โโโ format-stream.py # Stream-log formatter
โ
โโโ operators/ # Role-specific prompt sets (SKILL.md + references/)
โ โโโ build/ # Default workhorse: pick task, build, ship PR
โ โโโ review/ # Deep file-by-file code review
โ โโโ oversee/ # Task queue triage and metadata cleanup
โ โโโ strategize/ # Big-picture health report with auto-created tasks
โ โโโ achieve/ # Autonomy measurement and human-dependency elimination
โ โโโ security-check/ # Red-team preflight (read-only, runs before build)
โ
โโโ agents/ # Sub-agent prompts (specialist reviewers)
โ โโโ code-reviewer.md # Structure, types, tests, shell correctness
โ โโโ architecture-reviewer.md # Dependency flow, module boundaries, design
โ โโโ docs-reviewer.md # Changelog, handoff, tracker, cross-doc consistency
โ โโโ safety-reviewer.md # Secrets, subprocess safety, file system safety
โ โโโ meta-reviewer.md # Daemon integrity, prompt health (framework PRs only)
โ
โโโ lib/ # Shared Python helpers (zero nightshift deps)
โ โโโ cleanup.py # Log rotation, branch pruning, task archival
โ โโโ compact.py # Handoff compression
โ โโโ config.py # Project config loader
โ โโโ costs.py # Session cost tracking and budget enforcement
โ โโโ evaluation.py # Self-evaluation pipeline (10-dimension scoring)
โ
โโโ prompts/ # System prompts
โ โโโ autonomous.md # Universal rules prepended to every session
โ โโโ checkpoints.md # Optional verification pipeline checkpoints
โ
โโโ ops/ # Operations documentation
โ โโโ DAEMON.md # Daemon guide with troubleshooting
โ โโโ OPERATIONS.md # Complete system map (42KB reference)
โ โโโ PRE-PUSH-CHECKLIST.md # Safety checklist before pushing
โ โโโ ROLE-SCORING.md # Deep dive into scoring math per role
โ
โโโ scripts/ # Framework utilities
โ โโโ init.sh # Bootstrap new Recursive project
โ โโโ list-tasks.sh # Task queue display
โ โโโ rollback.sh # Revert last N commits (recovery tool)
โ โโโ validate-tasks.sh # Task YAML frontmatter validator
โ
โโโ skills/ # Skill definitions
โ โโโ setup/SKILL.md # Project setup skill
โ
โโโ templates/ # Structured-doc templates
โ โโโ handoff.md # Session handoff format
โ โโโ evaluation.md # Eval report format (10 dimensions)
โ โโโ session-index.md # Session index table header
โ โโโ task.md # Task file format (YAML frontmatter)
โ โโโ project-config.json # .recursive.json template
โ
โโโ tests/ # Framework tests (92 tests)
โโโ test_pick_role.py
14 directories of persistent state shared by both products. The daemon reads and writes these each cycle. Not checked into source control for target repos; versioned here because this repo is its own target.
.recursive/
โโโ architecture/ # Generated module map (Nightshift)
โโโ autonomy/ # Autonomy score reports (Recursive)
โโโ changelog/ # Per-version changelogs
โโโ evaluations/ # Phractal eval results (Nightshift)
โโโ handoffs/ # Session handoff summaries (Recursive)
โโโ healer/ # Healer observation logs (Recursive)
โโโ learnings/ # Hard-won knowledge index
โโโ plans/ # Feature build plans (Nightshift/Raven)
โโโ reviews/ # Code review artifacts (Recursive)
โโโ sessions/ # Session index and logs (Recursive)
โโโ strategy/ # Strategy reports (Recursive)
โโโ tasks/ # Task queue (frontmatter YAML)
โโโ vision/ # Vision documents
โโโ vision-tracker/ # Auto-generated progress tracker
Runtime/
โโโ Nightshift/ # Shift logs, state files, worktree links
Type checking is mypy --strict. Linting is Ruff. The local gate is
make check.
Nightshift shipped:
- Owl (hardening loop) with worktrees, diff scoring, and guard rails (99%)
- Raven (feature builder) with plan/build/resume/status/sub-agents (100%)
- multi-repo mode, module map generation, prompt injection boundaries
- self-evaluation against Phractal with 10-dimension scoring
Recursive shipped:
- unified daemon with signal-driven role selection across 6 operators
- red-team security-check preflight with severity-classified pentest reports
- 5-agent sub-agent review pipeline (code, architecture, docs, safety, meta)
- cross-session learnings (90+), structured handoffs, and cost tracking
- autonomy measurement and human-dependency elimination (score: 85/100)
- GitHub Issues auto-sync to internal task queue
Open in the queue (69 pending tasks):
- fix remaining real-repo evaluation gaps on rejected runs
- automate release tagging and changelog/tracker updates
- improve task queue hygiene and session-index fidelity
- budget limiter triple-failure fix (daemon cost tracking)
- add monitoring / alerting integrations
See .recursive/vision-tracker/TRACKER.md for the current scoreboard and .recursive/tasks/ for the active backlog.
- Python 3.9+
- Git
claudeCLI orcodexCLIghCLI for PR/release automationtmuxif you want long-running daemon sessions
MIT

