Two products. One repo.
Nightshift is an autonomous engineering product with two shipped loops -- Owl (hardening) and Raven (feature building).
Recursive is a portable autonomous orchestration framework that can run on any codebase.
This repo ships two independent products that work together:
Nightshift (nightshift/) is a Python package with two autonomous engineering loops:
-
Owl (Loop 1 -- Hardening, 99%): point it at a repository, let it profile the stack, create an isolated worktree, find one production-readiness issue per cycle, and either reject or commit the fix behind guard rails. Supports Codex and Claude, diff scoring, multi-repo mode, prompt injection boundaries, and self-evaluation against Phractal.
-
Raven (Loop 2 -- Feature Building, 100%): give it a feature request in plain English and it will profile the repo, plan the work, decompose it into waves, spawn sub-agents, integrate the results, run E2E and readiness checks, and persist build state for resume/status flows.
Recursive (.recursive/) is a portable autonomous orchestration framework. It provides the daemon loop, signal-driven role selection, operator prompts, agent lifecycle management, sub-agent review pipeline, and session memory. Recursive is designed to work on any codebase -- Nightshift is just the first project it operates on.
Recursive drives six operators each cycle via .recursive/engine/pick-role.py:
- Builder: reads the task queue, builds or fixes one scoped task, tests it, opens a PR, reviews it via sub-agents, and merges it
- Reviewer: picks one file, deep-reviews it against a checklist, fixes every issue found, and logs the review
- Overseer: triages the task queue, closes duplicates and obsolete work, updates stale metadata
- Strategist: gathers evidence across sessions, evaluations, and costs, then produces a top-down health report with auto-created follow-up tasks
- Achiever: measures autonomy score (0-100) across a 20-check scorecard, identifies the highest-impact human dependency, and eliminates it
- Security checker: red-team preflight that runs before each build -- scans for fragile paths, subprocess injection, credential leaks, and outputs a severity-classified pentest report
Recursive (framework) Nightshift (product)
āāāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāā
ā daemon.sh ā ā owl/ (Loop 1) ā
ā pick-role.py ā ā cycle.py ā
ā lib-agent.sh ā ā scoring.py ā
ā operators/ āāāāāāā>ā readiness.py ā
ā build/ ā ā ā
ā review/ ā ā raven/ (Loop 2) ā
ā oversee/ ā ā planner.py ā
ā strategize/ ā ā decomposer.py ā
ā achieve/ ā ā subagent.py ā
ā security-check/ ā ā integrator.py ā
ā agents/ ā ā feature.py ā
ā lib/ ā ā ā
ā prompts/ ā ā core/ settings/ ā
āāāāāāāāāāāāāāāāāāāāāāā ā infra/ schemas/ ā
ā āāāāāāāāāāāāāāāāāāāāāāāā
v
.recursive/ (runtime state -- session memory for both)
Recursive orchestrates. Nightshift does the engineering work. .recursive/ is the shared memory layer.
Most of the code in this repository was written, tested, reviewed, and merged by AI agents. The Recursive daemon (.recursive/engine/daemon.sh) auto-selects an operator each cycle, and Nightshift's Owl and Raven loops do the actual engineering.
The human role is operational: start the daemon and monitor it. The agents own the engineering loop -- including deciding what to work on.
Proof points live in the repo, not in marketing copy. Check them directly:
gh pr list --state merged --limit 50 # every merged PR
cat .recursive/sessions/index.md # daemon sessions with timestamps
cat .recursive/handoffs/LATEST.md # what the last session built
make tasks # authoritative task queue summarySnapshot taken from live repo data on 2026-04-07. Generated docs such as the
vision tracker and
module map are the source of truth when these
numbers change.
| Signal | Current reading | Source |
|---|---|---|
| Overall vision progress | 92% | .recursive/vision-tracker/TRACKER.md |
| Owl (Loop 1 hardening) | 99% | .recursive/vision-tracker/TRACKER.md |
| Raven (Loop 2 feature builder) | 100% | .recursive/vision-tracker/TRACKER.md |
| Self-maintaining repo | 68% | .recursive/vision-tracker/TRACKER.md |
| Meta-prompt system | 79% | .recursive/vision-tracker/TRACKER.md |
| Tests | 847 passing | python3 -m pytest nightshift/tests/ .recursive/tests/ -q |
| Nightshift modules | 23 | .recursive/architecture/MODULE_MAP.md |
| Recursive modules | 7 | .recursive/lib/ + .recursive/engine/ |
| Merged PRs | 155+ | gh pr list --state merged --json number |
| Daemon sessions | 100+ | .recursive/sessions/index.md |
| Documented learnings | 90+ | .recursive/learnings/INDEX.md |
curl -sL https://raw.githubusercontent.com/Recusive/Nightshift/main/nightshift/scripts/install.sh | bashThis installs Nightshift's wrapper scripts and prompt assets into:
~/.codex/skills/nightshift~/.claude/skills/nightshift
The installer does not create a global nightshift shell command. In a repo
checkout, use python3 -m nightshift .... From an installed skill bundle, use
the wrapper scripts in ~/.codex/skills/nightshift/scripts/.
Add runtime artifacts to the target repo's .gitignore:
cat <<'EOF' >> .gitignore
Runtime/Nightshift/worktree-*/
Runtime/Nightshift/*.runner.log
Runtime/Nightshift/*.state.json
EOFOptional per-repo config (copy and edit):
cp .nightshift.json.example .nightshift.jsonUse the Python module entry point that the codebase actually ships:
python3 -m nightshift run --agent claude # full overnight shift (Owl)
python3 -m nightshift test --agent claude --cycles 2 # short validation shift (Owl)
python3 -m nightshift summarize # print shift state JSON
python3 -m nightshift verify-cycle --worktree-dir PATH --pre-head HASH # verify cycle offline
python3 -m nightshift plan "Add OAuth login" # plan a feature build (Raven)
python3 -m nightshift build "Add OAuth login" --yes # build a feature end-to-end (Raven)
python3 -m nightshift build --status # check build progress
python3 -m nightshift build --resume # resume interrupted build
python3 -m nightshift multi /repo1 /repo2 --agent claude --test --cycles 1 # multi-repo
python3 -m nightshift module-map --write # generate architecture mappython3 -m nightshift test ... now keeps its state files, runner logs, and
linked worktree under $TMPDIR/nightshift-test-runs/... so evaluation clones
stay clean. Full run mode still writes repo-local runtime artifacts under
Runtime/Nightshift/.
Use the bundled wrapper scripts:
~/.codex/skills/nightshift/nightshift/scripts/run.sh --agent claude
~/.codex/skills/nightshift/nightshift/scripts/test.sh --agent claude --cycles 2 --cycle-minutes 5The Recursive daemon wraps Nightshift's loops with autonomous role selection, session memory, and self-maintenance:
make daemon # start the daemon (auto-picks operator each cycle)
make tasks # show pending/blocked/in-progress task queue
make check # full local CI gate (lint + typecheck + tests)
make test # run the full test suite
make dry-run # preview cycle prompt without spawning agents
make quick-test # 2-cycle validation run (~10 min)
make clean # remove runtime artifactsDaemon examples:
tmux new-session -d -s nightshift "bash .recursive/engine/daemon.sh claude 60"
RECURSIVE_PENTEST_AGENT=codex tmux new-session -d -s nightshift "bash .recursive/engine/daemon.sh claude 60"
tmux capture-pane -t nightshift -p -S -15Abridged example. Full source of truth: .nightshift.json.example
{
"agent": "codex or claude",
"hours": 8,
"cycle_minutes": 30,
"verify_command": null,
"blocked_paths": [".github/", "deploy/", "deployment/", "infra/", "k8s/", "ops/", "terraform/", "vendor/"],
"blocked_globs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb", "Cargo.lock"],
"max_fixes_per_cycle": 3,
"max_files_per_fix": 5,
"max_files_per_cycle": 12,
"max_low_impact_fixes_per_shift": 4,
"stop_after_failed_verifications": 2,
"stop_after_empty_cycles": 2,
"score_threshold": 3,
"test_incentive_cycle": 3,
"backend_forcing_cycle": 3,
"category_balancing_cycle": 3,
"claude_model": "claude-opus-4-6",
"claude_effort": "max",
"codex_model": "gpt-5.4",
"codex_thinking": "extra_high",
"notification_webhook": null,
"readiness_checks": ["secrets", "debug_prints", "test_coverage"],
"eval_frequency": 5,
"eval_target_repo": "https://github.com/fazxes/Phractal"
}If verify_command is left null, Nightshift tries to infer one from repo
signals such as pyproject.toml, package.json, Cargo.toml, or go.mod.
Environment variables:
RECURSIVE_CLAUDE_MODEL-- override Claude model (default: claude-opus-4-6)RECURSIVE_CODEX_MODEL-- override Codex model (default: gpt-5.4)RECURSIVE_CODEX_THINKING-- Codex thinking level (default: extra_high)RECURSIVE_BUDGET-- max USD spend before daemon stopsRECURSIVE_PENTEST_AGENT-- agent for security preflight (default: same as main)RECURSIVE_PENTEST_MAX_TURNS-- max turns for pentest agentRECURSIVE_FORCE_ROLE-- bypass role scoring (build/review/oversee/strategize/achieve)RECURSIVE_PIPELINE_CHECKPOINTS-- enable verification checkpoints (0/1)
The daemon reads live system signals each cycle and scores all five selectable roles. The highest score wins, with tie-break favoring build. Key signals:
| Signal | Effect |
|---|---|
| 5+ consecutive builds | Triggers review |
| 50+ pending tasks | Triggers oversee |
| 15+ sessions since last strategy | Triggers strategize |
| Autonomy score < 70 | Triggers achieve |
| Urgent tasks in queue | Boosts build |
Security-check runs as a preflight before every build -- it is not scored.
Override with RECURSIVE_FORCE_ROLE=review to bypass scoring.
Full scoring math: .recursive/ops/ROLE-SCORING.md.
Both products are designed for stateless agents, so the repo carries the memory
in .recursive/:
- Handoffs: every session writes a structured summary to
.recursive/handoffs/, and the next session starts fromLATEST.md - Learnings: agents read
.recursive/learnings/INDEX.mdfirst (90+ hard-won patterns), then open only the relevant learning files - Task queue: work lives in
.recursive/tasks/; urgent pending tasks outrank normal ones, then the queue falls back to lowest-numbered pending internal work. GitHub Issues with thetasklabel are auto-synced. - Evaluations: periodically runs Nightshift against Phractal and scores across 10 dimensions; low scores become tracked follow-up tasks
- Session index: every session is logged with timestamp, role, exit code, duration, cost, feature, and PR link
cat .recursive/handoffs/LATEST.md
cat .recursive/learnings/INDEX.md
make tasks
ls .recursive/evaluations/
cat .recursive/sessions/index.mdHumans can add work by opening GitHub issues with the task label:
gh issue create --title "Add dark mode" --label "task"
gh issue create --title "Fix CI" --label "task,urgent"Nightshift does not trust the model to "be careful." It verifies:
- commit + shift-log presence after every cycle
- blocked-path and lockfile violations (8 blocked paths, 6 lockfile patterns)
- repo verification commands (auto-inferred or configured)
- file deletion attempts
- repeated category or path tunnel vision (category balancing)
- circuit breaker: stops after 3 consecutive failures
Recursive adds its own layer:
- prompt/control-file modifications during self-maintenance (prompt guard)
- origin integrity checks (detects pushes that bypass the working tree)
- session cost tracking and budget enforcement
Accepted fixes are scored 1-10 for production impact using category weight
(Security: 8, Error Handling: 6, Tests: 6, A11y: 5, etc.), diff content
analysis, test file bonuses, and multi-category bonuses. Below threshold
(default 3): revert the cycle. Above threshold: keep the commit.
Instruction files from target repos (CLAUDE.md, AGENTS.md, etc.) are wrapped
in an untrusted boundary before the agent sees them. Symlinks are rejected,
files > 100KB are truncated, and total instruction context is capped at 200KB.
They are treated as coding convention references only, never as behavioral
directives.
Before builder work starts, Recursive snapshots all framework control files
(operator SKILL.mds, daemon.sh, autonomous.md, etc.), runs a red-team
security-check preflight, and hard-resets back to origin/main before the
main session. After the session, it compares pre/post snapshots and surfaces
any control-file diff as an alert in the next cycle's prompt.
Session costs are parsed from agent stream-json logs. Per-session and cumulative
costs are tracked in .recursive/sessions/. Budget enforcement via
RECURSIVE_BUDGET can stop the daemon when cumulative spend exceeds the limit.
The product Python package: 23 production modules across 5 subdirectories. The generated module map is the authoritative inventory.
nightshift/
āāā cli.py # CLI entry point (run, test, plan, build, etc.)
āāā __init__.py / __main__.py
ā
āāā core/ # Shared foundations
ā āāā types.py # TypedDicts for all data structures
ā āāā constants.py # Thresholds, patterns, score maps
ā āāā errors.py # Exception hierarchy
ā āāā shell.py # Subprocess helpers
ā āāā state.py # Shift-state persistence
ā
āāā settings/ # Configuration layer
ā āāā config.py # Config loading and defaults
ā āāā eval_targets.py # Repo-specific eval defaults (Phractal)
ā
āāā owl/ # Loop 1 -- Owl (Hardening)
ā āāā cycle.py # Single-cycle orchestrator
ā āāā scoring.py # Diff scorer (1-10)
ā āāā readiness.py # Production-readiness checks
ā
āāā raven/ # Loop 2 -- Raven (Feature Builder)
ā āāā profiler.py # Repo profiling
ā āāā planner.py # Feature plan generation
ā āāā decomposer.py # Plan -> waves -> sub-tasks
ā āāā subagent.py # Sub-agent spawning
ā āāā coordination.py # Wave coordination
ā āāā integrator.py # Result integration
ā āāā e2e.py # End-to-end verification
ā āāā summary.py # Build summaries
ā āāā feature.py # Top-level build command
ā
āāā infra/ # Infrastructure modules
ā āāā worktree.py # Git worktree isolation
ā āāā multi.py # Multi-repo mode
ā āāā module_map.py # Module-map generation
ā
āāā schemas/ # JSON schemas
ā āāā nightshift.schema.json
ā āāā feature.schema.json
ā āāā task.schema.json
ā
āāā scripts/ # Shell wrappers
ā āāā install.sh # Skill-bundle installer
ā āāā run.sh / test.sh # Convenience runners
ā āāā check.sh # Local CI gate
ā āāā smoke-test.sh # Quick sanity check
ā
āāā assets/
ā āāā icon.png
ā
āāā tests/ # Product test suite (847 tests)
āāā test_nightshift.py
āāā test_feature_build.py
āāā test_module_map.py
A portable autonomous orchestration framework. Drives the daemon, role
selection, operator prompts, agent lifecycle, sub-agent reviews, and session
memory. Zero dependencies on nightshift/ -- designed to work on any codebase.
.recursive/
āāā engine/ # Daemon runtime
ā āāā daemon.sh # Main daemon loop (hot-reloads each cycle)
ā āāā lib-agent.sh # Agent lifecycle, prompt guard, session utils
ā āāā pick-role.py # Signal-driven role scoring engine
ā āāā watchdog.sh # Process watchdog
ā āāā format-stream.py # Stream-log formatter
ā
āāā operators/ # Role-specific prompt sets (SKILL.md + references/)
ā āāā build/ # Default workhorse: pick task, build, ship PR
ā āāā review/ # Deep file-by-file code review
ā āāā oversee/ # Task queue triage and metadata cleanup
ā āāā strategize/ # Big-picture health report with auto-created tasks
ā āāā achieve/ # Autonomy measurement and human-dependency elimination
ā āāā security-check/ # Red-team preflight (read-only, runs before build)
ā
āāā agents/ # Sub-agent prompts (specialist reviewers)
ā āāā code-reviewer.md # Structure, types, tests, shell correctness
ā āāā architecture-reviewer.md # Dependency flow, module boundaries, design
ā āāā docs-reviewer.md # Changelog, handoff, tracker, cross-doc consistency
ā āāā safety-reviewer.md # Secrets, subprocess safety, file system safety
ā āāā meta-reviewer.md # Daemon integrity, prompt health (framework PRs only)
ā
āāā lib/ # Shared Python helpers (zero nightshift deps)
ā āāā cleanup.py # Log rotation, branch pruning, task archival
ā āāā compact.py # Handoff compression
ā āāā config.py # Project config loader
ā āāā costs.py # Session cost tracking and budget enforcement
ā āāā evaluation.py # Self-evaluation pipeline (10-dimension scoring)
ā
āāā prompts/ # System prompts
ā āāā autonomous.md # Universal rules prepended to every session
ā āāā checkpoints.md # Optional verification pipeline checkpoints
ā
āāā ops/ # Operations documentation
ā āāā DAEMON.md # Daemon guide with troubleshooting
ā āāā OPERATIONS.md # Complete system map (42KB reference)
ā āāā PRE-PUSH-CHECKLIST.md # Safety checklist before pushing
ā āāā ROLE-SCORING.md # Deep dive into scoring math per role
ā
āāā scripts/ # Framework utilities
ā āāā init.sh # Bootstrap new Recursive project
ā āāā list-tasks.sh # Task queue display
ā āāā rollback.sh # Revert last N commits (recovery tool)
ā āāā validate-tasks.sh # Task YAML frontmatter validator
ā
āāā skills/ # Skill definitions
ā āāā setup/SKILL.md # Project setup skill
ā
āāā templates/ # Structured-doc templates
ā āāā handoff.md # Session handoff format
ā āāā evaluation.md # Eval report format (10 dimensions)
ā āāā session-index.md # Session index table header
ā āāā task.md # Task file format (YAML frontmatter)
ā āāā project-config.json # .recursive.json template
ā
āāā tests/ # Framework tests (92 tests)
āāā test_pick_role.py
14 directories of persistent state shared by both products. The daemon reads and writes these each cycle. Not checked into source control for target repos; versioned here because this repo is its own target.
.recursive/
āāā architecture/ # Generated module map (Nightshift)
āāā autonomy/ # Autonomy score reports (Recursive)
āāā changelog/ # Per-version changelogs
āāā evaluations/ # Phractal eval results (Nightshift)
āāā handoffs/ # Session handoff summaries (Recursive)
āāā healer/ # Healer observation logs (Recursive)
āāā learnings/ # Hard-won knowledge index
āāā plans/ # Feature build plans (Nightshift/Raven)
āāā reviews/ # Code review artifacts (Recursive)
āāā sessions/ # Session index and logs (Recursive)
āāā strategy/ # Strategy reports (Recursive)
āāā tasks/ # Task queue (frontmatter YAML)
āāā vision/ # Vision documents
āāā vision-tracker/ # Auto-generated progress tracker
Runtime/
āāā Nightshift/ # Shift logs, state files, worktree links
Type checking is mypy --strict. Linting is Ruff. The local gate is
make check.
Nightshift shipped:
- Owl (hardening loop) with worktrees, diff scoring, and guard rails (99%)
- Raven (feature builder) with plan/build/resume/status/sub-agents (100%)
- multi-repo mode, module map generation, prompt injection boundaries
- self-evaluation against Phractal with 10-dimension scoring
Recursive shipped:
- unified daemon with signal-driven role selection across 6 operators
- red-team security-check preflight with severity-classified pentest reports
- 5-agent sub-agent review pipeline (code, architecture, docs, safety, meta)
- cross-session learnings (90+), structured handoffs, and cost tracking
- autonomy measurement and human-dependency elimination (score: 85/100)
- GitHub Issues auto-sync to internal task queue
Open in the queue (69 pending tasks):
- fix remaining real-repo evaluation gaps on rejected runs
- automate release tagging and changelog/tracker updates
- improve task queue hygiene and session-index fidelity
- budget limiter triple-failure fix (daemon cost tracking)
- add monitoring / alerting integrations
See .recursive/vision-tracker/TRACKER.md for the current scoreboard and .recursive/tasks/ for the active backlog.
- Python 3.9+
- Git
claudeCLI orcodexCLIghCLI for PR/release automationtmuxif you want long-running daemon sessions
MIT

