AI-native penetration testing — autonomous reconnaissance, exploitation, and verified results.
Quick Start · Architecture · Usage · Configuration · Contributing
Phantom is an autonomous AI penetration testing agent built on the ReAct (Reason–Act) loop. It connects a large language model to over 30 professional security tools, runs all offensive operations inside an isolated Docker sandbox, and produces verified vulnerability reports — entirely without human intervention.
Unlike CVE-signature scanners, Phantom reasons about your target: it reads HTTP responses, forms hypotheses, selects the right tool, chains multi-step exploits, then writes and executes a proof-of-concept script to confirm every finding before it appears in a report.
| Traditional Scanners | Phantom | |
|---|---|---|
| Approach | Signature matching against CVE databases | LLM reasoning + adaptive tool chaining |
| False Positives | 40–70% — requires manual triage | Every finding verified with a working PoC |
| Depth | Single-pass HTTP probe | Multi-phase: recon → exploit → verify |
| Adaptability | Fixed rules, static payloads | Adapts to target responses in real time |
| Novel Vulns | Known CVEs only | Logic flaws + novel attack paths |
| Reporting | Generic vulnerability lists | MITRE ATT&CK mapped, compliance-ready |
| 🧠 | Autonomous ReAct Loop — Plans, executes tools, reads results, re-plans. Handles dead ends and unexpected responses without human guidance. |
| 🔧 | 53 Security Tools — nmap · nuclei · sqlmap · ffuf · httpx · katana · subfinder · nikto · gobuster · arjun · semgrep · playwright — all orchestrated automatically. |
| 🐳 | Ephemeral Docker Sandbox — All offensive tooling runs in a network-restricted Kali Linux container. Zero host filesystem access. Container is destroyed after every scan. |
| ⚡ | Multi-Agent Parallelism — Spawns specialized sub-agents (SQLi, XSS, recon) that work concurrently and report findings to the coordinator. |
| 🛡️ | 7-Layer Defense Model — Scope guard → Tool firewall → Docker sandbox → Cost limiter → Time budget → HMAC audit trail → Output sanitizer. |
| ✅ | Verified Findings Only — No hallucinations. Every reported vulnerability includes raw HTTP evidence, reproduction steps, and a working exploit script. |
| 🗺️ | MITRE ATT&CK Enrichment — Automatic CWE, CAPEC, technique-level tagging, and CVSS 3.1 scoring per finding. |
| 📋 | Compliance Coverage — OWASP Top 10 (2021) · PCI DSS v4.0 · NIST 800-53 — mapped automatically per finding. |
| 💾 | Knowledge Persistence — Cross-scan memory stores hosts, past findings, and false-positive signatures. Each scan learns from the last. |
| 💰 | Full Cost Control — Per-request and per-scan budget caps. Every token and every dollar tracked in real time. |
① System Architecture — Component Overview
%%{init: {"theme": "dark"}}%%
flowchart TD
USER(["👤 User / CI-CD"])
subgraph IFACE["Interface Layer"]
CLI["CLI · TUI"]
PARSER["Output Parser"]
end
subgraph ORCH["Orchestration"]
PROFILE["Scan Profile"]
SCOPE["Scope Guard"]
COST["Cost Controller"]
AUDIT["HMAC Audit Log"]
end
subgraph AGENT["Agent Core — ReAct"]
LLM["LLM via LiteLLM"]
STATE["State Machine"]
MEM["Memory Engine"]
SKILLS["Skills Engine"]
end
subgraph SEC["Security Layer"]
FW["Tool Firewall"]
VERIFY["Verifier"]
SANIT["Sanitizer"]
end
subgraph SANDBOX["Docker Sandbox — Kali Linux"]
TSRV["Tool Server :48081"]
TOOLS["30+ Security Tools"]
BROWSER["Playwright · Chromium"]
PROXY["Caido Proxy :48080"]
end
subgraph OUTPUT["Output Pipeline"]
REPORTS["JSON · MD · HTML"]
GRAPH["Attack Graph"]
MITRE["MITRE ATT&CK Map"]
end
USER --> IFACE
IFACE --> ORCH
ORCH --> AGENT
AGENT <--> SEC
SEC --> SANDBOX
AGENT --> OUTPUT
style IFACE fill:#6c5ce7,stroke:#a29bfe,color:#ffffff
style ORCH fill:#00b894,stroke:#55efc4,color:#ffffff
style AGENT fill:#e17055,stroke:#fab1a0,color:#ffffff
style SEC fill:#d63031,stroke:#ff7675,color:#ffffff
style SANDBOX fill:#0984e3,stroke:#74b9ff,color:#ffffff
style OUTPUT fill:#f9ca24,stroke:#f0932b,color:#2d3436
② Scan Execution Flow — Phase by Phase
%%{init: {"theme": "dark"}}%%
sequenceDiagram
actor User
participant CLI as Phantom CLI
participant Orch as Orchestrator
participant Agent as Agent ReAct
participant FW as Tool Firewall
participant Box as Docker Sandbox
participant LLM as LLM Provider
participant T as Target App
User->>CLI: phantom scan -t https://app.com
CLI->>Orch: Validate scope · init cost controller
Orch->>Box: Spin up ephemeral Kali container
Orch->>Agent: Begin scan · profile + scope injected
rect rgb(48, 25, 80)
Note over Agent,LLM: Phase 1 — Reconnaissance
Agent->>LLM: Analyze target · plan recon
LLM-->>Agent: Run katana · httpx · nmap
Agent->>FW: Validate tool call
FW-->>Agent: Approved
Agent->>Box: Execute recon tools
Box->>T: HTTP probes · port scans · crawl
T-->>Box: Responses
Box-->>Agent: Endpoints · tech stack · open ports
end
rect rgb(80, 20, 20)
Note over Agent,LLM: Phase 2 — Exploitation
Agent->>LLM: Hypothesize attack vectors
LLM-->>Agent: SQLi on /api/login · XSS on /search
Agent->>Box: sqlmap · custom payload injection
Box->>T: Exploit attempts
T-->>Box: Vulnerability confirmed
Box-->>Agent: Raw HTTP evidence
end
rect rgb(15, 60, 30)
Note over Agent,LLM: Phase 3 — Verification
Agent->>Box: Re-exploit with clean PoC script
Box->>T: Reproduce exact attack
T-->>Box: Confirmed
Agent->>Agent: CVSS 3.1 · CWE tag · MITRE map
end
Agent->>CLI: Findings compiled
CLI->>User: Vulnerabilities + PoCs + Compliance
CLI->>Box: Destroy container
③ Agent ReAct Loop — Decision Cycle
%%{init: {"theme": "dark"}}%%
flowchart LR
INIT(["Scan Start"])
OBS["Observe\nCollect results"]
THINK["Reason\nAnalyze context"]
PLAN["Plan\nChoose tool"]
ACT["Act\nBuild arguments"]
FW{"Firewall?"}
EXEC["Execute\nDocker sandbox"]
DONE{"Stop\nCondition?"}
VERIFY["Verify\nRe-test findings"]
ENRICH["Enrich\nMITRE · CVSS"]
REPORT["Report\nJSON · HTML · MD"]
FINISH(["Scan Complete ☠"])
INIT --> OBS
OBS --> THINK
THINK --> PLAN
PLAN --> ACT
ACT --> FW
FW -- "✓ Pass" --> EXEC
FW -- "✗ Block" --> THINK
EXEC --> OBS
OBS --> DONE
DONE -- "Continue" --> THINK
DONE -- "Done" --> VERIFY
VERIFY --> ENRICH
ENRICH --> REPORT
REPORT --> FINISH
style INIT fill:#6c5ce7,stroke:#a29bfe,color:#fff
style FINISH fill:#6c5ce7,stroke:#a29bfe,color:#fff
style FW fill:#d63031,stroke:#ff7675,color:#fff
style DONE fill:#e17055,stroke:#fab1a0,color:#fff
style EXEC fill:#0984e3,stroke:#74b9ff,color:#fff
style REPORT fill:#00b894,stroke:#55efc4,color:#fff
④ Docker Sandbox — Isolation Architecture
%%{init: {"theme": "dark"}}%%
flowchart LR
HOST(["Phantom Agent\nHost Machine"])
subgraph CONTAINER["Kali Linux Container — Network Isolated"]
TSRV["Tool Server :48081"]
PROXY["Caido Proxy :48080"]
subgraph TOOLKIT["Security Toolkit"]
SCA["nmap · masscan"]
INJ["sqlmap · nuclei"]
FUZ["ffuf · gobuster · arjun"]
WEB["httpx · katana"]
ANA["nikto · semgrep"]
end
subgraph RUNTIME["Runtime Environment"]
PY["Python 3.12"]
BR["Playwright + Chromium"]
SH["Bash Shell"]
end
end
TARGET(["Target\nApplication"])
HOST -- "Authenticated API" --> TSRV
TSRV --> TOOLKIT
TSRV --> RUNTIME
PROXY -- "Intercept + Log" --> TARGET
TOOLKIT -- "Attack traffic" --> TARGET
RUNTIME -- "Browser sessions" --> TARGET
style CONTAINER fill:#0984e3,stroke:#74b9ff,color:#ffffff
style TOOLKIT fill:#d63031,stroke:#ff7675,color:#ffffff
style RUNTIME fill:#6c5ce7,stroke:#a29bfe,color:#ffffff
style HOST fill:#2d3436,stroke:#636e72,color:#dfe6e9
style TARGET fill:#2d3436,stroke:#636e72,color:#dfe6e9
⑤ 7-Layer Defense Model — Request Lifecycle
%%{init: {"theme": "dark"}}%%
flowchart TD
REQ(["Incoming Request"])
L1["① Scope Validator\nTarget allowlist · SSRF protection"]
L2["② Tool Firewall\nArg sanitization · Injection block"]
L3["③ Docker Sandbox\nEphemeral Kali · Restricted Linux caps"]
L4["④ Cost Controller\nPer-request ceiling · Budget cap"]
L5["⑤ Time Limiter\nPer-tool timeout · Global scan expiry"]
L6["⑥ HMAC Audit Trail\nTamper-evident append-only log"]
L7["⑦ Output Sanitizer\nPII redaction · Credential scrubbing"]
PASS(["✓ Authorized Output"])
BLOCK(["✗ Blocked & Logged"])
REQ --> L1
L1 -- "✓ In scope" --> L2
L1 -- "✗ Out of scope" --> BLOCK
L2 -- "✓ Safe" --> L3
L2 -- "✗ Injection" --> BLOCK
L3 --> L4
L4 -- "✓ Within budget" --> L5
L4 -- "✗ Over budget" --> BLOCK
L5 -- "✓ In time" --> L6
L5 -- "✗ Timeout" --> BLOCK
L6 --> L7
L7 --> PASS
style REQ fill:#6c5ce7,stroke:#a29bfe,color:#fff
style PASS fill:#00b894,stroke:#55efc4,color:#fff
style BLOCK fill:#d63031,stroke:#ff7675,color:#fff
style L1 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L2 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L3 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L4 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L5 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L6 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L7 fill:#2d3436,stroke:#636e72,color:#dfe6e9
Requirements: Docker · Python 3.12+ · An LLM API key
# Install
pip install phantom-agent
# or for fully isolated install:
pipx install phantom-agent
# Set your LLM
export PHANTOM_LLM="openai/gpt-4o" # any LiteLLM-supported model
export LLM_API_KEY="sk-..."
# Run your first scan
phantom -t https://your-app.comFirst run pulls the sandbox image (~13 GB). This happens once. Subsequent scans start in under 10 seconds.
docker run --rm -it \
-e PHANTOM_LLM="openai/gpt-4o" \
-e LLM_API_KEY="your-key" \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/usta0x001/phantom:latest \
-t https://your-app.com# Quick scan (~15 min) — CI/CD friendly
phantom -t https://app.com -m quick
# Standard scan (~45 min) — recommended default
phantom -t https://app.com
# Deep scan (1–3 h) — exhaustive coverage
phantom -t https://app.com -m deep
# With custom focus instructions
phantom -t https://app.com \
--instruction "Focus on SQL injection and broken auth in /api/v2"
# Resume an interrupted scan
phantom -t https://app.com
# Non-interactive (CI/CD pipelines)
phantom -t https://app.com --non-interactive
# Set a cost ceiling
PHANTOM_MAX_COST=2.00 phantom -t https://app.com| Profile | Max Iterations | Typical Duration | Best For |
|---|---|---|---|
quick |
300 | ~15–60 min | CI/CD gates, rapid triage |
standard |
120 | ~20–45 min | Regular security testing |
deep |
300 | 1–3 hours | Full audits, compliance (default) |
stealth |
60 | ~30–60 min | Covert assessments, WAF-aware targets |
api_only |
100 | ~20–45 min | REST/GraphQL API-focused scans |
Every scan produces:
phantom_runs/<target>_<id>/
├── vulnerabilities/
│ ├── vuln-0001.md # Full finding with PoC exploit
│ └── vuln-0002.md
├── audit.jsonl # HMAC-signed immutable event log
├── scan_stats.json # Cost, tokens, timing metrics
├── enhanced_state.json # Full scan state snapshot
└── vulnerabilities.csv # Summary index for triage
Every scan automatically runs a 7-stage enrichment pass:
| Stage | Action |
|---|---|
| 1. MITRE ATT&CK | CWE, CAPEC, technique-level tagging |
| 2. Compliance | OWASP Top 10 · PCI DSS v4 · NIST 800-53 |
| 3. Attack Graph | Dependency-based path analysis |
| 4. Nuclei Templates | Auto-generated YAML for regression testing |
| 5. Knowledge Store | Persistent cross-scan memory updated |
| 6. Notifications | Webhook / Slack alerts for critical findings |
| 7. Reports | JSON + HTML + Markdown output |
# .github/workflows/security.yml
name: Security Scan
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly — Monday at 2 AM
jobs:
phantom-scan:
runs-on: ubuntu-latest
steps:
- name: Run Phantom
run: |
pip install phantom-agent
phantom scan \
--target ${{ vars.STAGING_URL }} \
--scan-mode quick \
--non-interactive \
--output json
env:
PHANTOM_LLM: openai/gpt-4o
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
PHANTOM_MAX_COST: "1.00"Environment Variables
| Variable | Description | Default |
|---|---|---|
PHANTOM_LLM |
LLM model (LiteLLM format) | openai/gpt-4o |
LLM_API_KEY |
API key — comma-separated for rotation | — |
PHANTOM_REASONING_EFFORT |
low / medium / high |
medium |
PHANTOM_SCAN_MODE |
Default scan profile | standard |
PHANTOM_IMAGE |
Sandbox Docker image | ghcr.io/usta0x001/phantom-sandbox:latest |
PHANTOM_MAX_COST |
Hard stop when total scan cost (USD) reaches this limit | — |
PHANTOM_PER_REQUEST_CEILING |
Hard stop when a single LLM call exceeds this cost in USD | — |
LLM_MAX_TOKENS |
Override max output tokens per LLM call (overrides scan-mode defaults: quick=4000, stealth=6000, default=8000) | — |
PHANTOM_WEBHOOK_URL |
Webhook URL for critical alerts | — |
PHANTOM_DISABLE_BROWSER |
Disable Playwright browser | false |
PHANTOM_TELEMETRY |
Enable anonymous usage telemetry | false |
Supported LLM Providers
Phantom uses LiteLLM — 100+ providers work out of the box:
| Provider | Example Model | Notes |
|---|---|---|
| OpenAI | openai/gpt-4o |
Best overall quality |
| Anthropic | anthropic/claude-opus-4-5 |
Strong multi-step reasoning |
gemini/gemini-2.5-pro |
Huge context window | |
| Groq | groq/llama-3.3-70b-versatile |
Free tier, very fast |
| DeepSeek | deepseek/deepseek-chat |
Excellent cost efficiency |
| OpenRouter | openrouter/deepseek/deepseek-v3.2 |
Multi-provider routing |
| Ollama | ollama/llama3.1 |
Fully local — no API key required |
| Azure OpenAI | azure/gpt-4o |
Enterprise deployments |
Phantom has undergone extensive adversarial auditing across multiple versions:
| Severity | Identified | Fixed |
|---|---|---|
| Critical | 8 | 8 |
| High | 19 | 19 |
| Medium | 34 | 34 |
| Low | 27 | 27 |
| Total | 88 | 88 |
All 88 identified issues are resolved. See CHANGELOG.md for the full history.
# Run the full test suite
pytest tests/ -v
# With coverage report
pytest tests/ --cov=phantom --cov-report=html
# Run specific categories
pytest tests/ -m "security"
pytest tests/ -m "integration"See tests/ for the test suite. Integration and end-to-end tests require a live Docker environment.
| Resource | Description |
|---|---|
| Architecture | Deep-dive system design |
| Documentation | Full API and configuration reference |
| Contributing | Development guidelines |
| Changelog | Version history and release notes |
Phantom runs all offensive tools inside an isolated Docker sandbox container. This section covers setup for fresh installs and custom environments.
The default image is ghcr.io/usta0x001/phantom-sandbox:latest — a pre-built Kali Linux container with all security tools installed.
Requirements:
- Docker Desktop or Docker Engine installed and running
- The image is pulled automatically on first scan (~13 GB, one-time download)
# Pre-pull the image manually (optional, avoids delay on first scan)
docker pull ghcr.io/usta0x001/phantom-sandbox:latestOverride the image via environment variable or config:
# Environment variable
export PHANTOM_IMAGE="ghcr.io/usta0x001/phantom-sandbox:latest"
phantom -t https://target.com
# Or in ~/.phantom/config.yaml
phantom_image: "ghcr.io/usta0x001/phantom-sandbox:latest"If your environment has no internet access:
# On a machine with internet access — save the image
docker pull ghcr.io/usta0x001/phantom-sandbox:latest
docker save ghcr.io/usta0x001/phantom-sandbox:latest | gzip > phantom-sandbox.tar.gz
# On the air-gapped machine — load it
docker load < phantom-sandbox.tar.gz
# Point Phantom at the loaded image
export PHANTOM_IMAGE="ghcr.io/usta0x001/phantom-sandbox:latest"# Quick smoke test — should start a container and exit cleanly
docker run --rm ghcr.io/usta0x001/phantom-sandbox:latest nmap --version
docker run --rm ghcr.io/usta0x001/phantom-sandbox:latest nuclei --versionContributions are welcome. See CONTRIBUTING.md for setup instructions.
- Bugs → Open an issue
- Features → Start a discussion
- PRs → Fork · branch · test · submit
Apache License 2.0 — see LICENSE.
Built on the shoulders of giants:
LiteLLM · Nuclei · SQLMap · Playwright · Textual · Rich · ffuf · Subfinder · Caido
