surf

The open framework for AI agent orchestration.

Build multi-agent systems that route queries to specialist agents,
ground every answer in your own knowledge base, and ship across
web, desktop, and mobile from a single codebase.

Quickstart • How it works • Features • Agents • Deep Dive • Contributing

Quickstart

Prerequisites

Python 3.12+
uv
just
Azure CLI (logged in)
Azure subscription with OpenAI access

Setup

az login
cd api && uv sync && cd ../ingestion && uv sync && cd ..
just setup-dev          # deploy dev Azure resources + generate .env
just dev                # start API with hot reload (auto-starts Postgres, runs migrations)

Verify

curl http://localhost:8090/api/v1/health

Note: RBAC role propagation can take a few minutes. If you get 403 errors, wait and retry.

Run with DevUI / Web / Desktop

just devui              # interactive agent chat with tool call visibility — port 8091
just web                # full SPA with auth, conversation history, debug panels — port 3000
just desktop            # Tauri desktop app with native window management

How it works

graph TD
  web["Web / Desktop / Mobile<br/>surf-kit + React"]
  nginx["nginx<br/>reverse proxy"]
  api["FastAPI API"]
  coordinator["Coordinator Agent<br/>claude-haiku-4-5"]
  hr["HR Agent<br/>claude-sonnet-4-6"]
  it["IT Agent<br/>claude-sonnet-4-6"]
  website["Website Agent<br/>claude-sonnet-4-6"]
  rag["Azure AI Search<br/>BM25 + Vector"]
  proofread["Proofreader<br/>claude-haiku-4-5"]
  qg["Quality Gate"]
  postgres["PostgreSQL<br/>conversations + feedback"]
  otel["OpenTelemetry<br/>Azure Monitor / OTLP"]
  langfuse["Langfuse<br/>LLM tracing"]
  keyvault["Key Vault"]

  web -->|SSE| nginx --> api
  api --> coordinator
  coordinator -->|handoff| hr
  coordinator -->|handoff| it
  coordinator -->|handoff| website
  hr --> rag
  it --> rag
  website --> rag
  hr --> qg --> proofread
  api --> postgres
  api --> otel
  api --> langfuse
  api --> keyvault

What you get


Zero-registration agents	Subclass `DomainAgent` and the framework discovers, registers, and wires it automatically. No config files.
Auth-filtered routing	Agents are invisible to users who lack the required auth level. The coordinator can't even describe them.
3-strategy RAG	Hybrid search with broadened-filter fallback, keyword-only rescue, and post-response quality gates.
Prompt injection defence	Four independent layers — domain-isolated RAG, structured JSON, quality gate, source-pollution guard.
Multi-model routing	Haiku for fast coordinator decisions, Sonnet for specialist agents. Direct Anthropic or Azure AI Foundry.
Ship everywhere	Web, desktop, and mobile from one React codebase via the shared `surf-kit` component library.

Agents

Agent	Purpose	RAG Scope	Model	Auth Level
Coordinator	Routes queries, synthesises multi-domain answers	Unscoped	Haiku (fast)	Public
HR	Leave, onboarding, performance, L&D policies	`domain=hr`	Sonnet	Microsoft Account
IT	VPN, passwords, software, hardware, security	`domain=it`	Sonnet	Organisational
Website	Public-facing content, services, events	`content_source=website`	Sonnet	Public

Adding a new agent

# api/src/agents/finance/agent.py
class FinanceAgent(DomainAgent):
    @property
    def name(self) -> str:
        return "finance_agent"

    @property
    def description(self) -> str:
        return "Handles budget and procurement queries"

    @property
    def rag_scope(self) -> RAGScope:
        return RAGScope(domain="finance", document_types=["policy", "procedure"])

    @property
    def system_prompt(self) -> str:
        return "You are a finance specialist..."

That's it. No registration, no config changes. The framework discovers the subclass at startup, creates its RAG tool with domain-isolated filters, and adds it to the coordinator's handoff graph. See api/src/agents/_base.py for the full interface and api/src/agents/_discovery.py for the discovery mechanism.

Deep Dive

Project Structure

surf/
  api/                  FastAPI backend — agents, orchestrator, RAG, middleware
    src/
      agents/           Domain agents + coordinator (auto-discovered)
      orchestrator/     Workflow builder, PDF processing, middleware pipeline
      rag/              Search execution, 3-strategy tool, quality gate
      routes/           Chat, auth, user profile, admin, agent listing
      services/         Conversation persistence, Graph API, streaming, response pipeline
      middleware/       Auth, rate limiting, body limits, telemetry, input validation
      config/           Settings with environment-aware validation
    tests/
      unit/             28 test modules (~7K lines)
      security/         JWT bypass, prompt injection, conversation isolation
      integration/      Multi-turn flows against real Postgres
      eval/             LLM-judged response quality suite
      load/             Locust load testing
  web/                  React 19 + Vite 7 + TailwindCSS 4 frontend
    src-tauri/          Tauri desktop app (Rust shell)
  mobile/               React Native + Expo (iOS / Android)
  ingestion/            Document pipeline — PDF, DOCX, TXT, CSV connectors
  infra/                Azure IaC — 19 Bicep modules, 1,200+ lines
    modules/            Application Insights custom module
    environments/       dev / staging / prod parameter files
    workbooks/          Azure Monitor telemetry workbook
  data/                 Sample documents and ingestion manifests

Architecture (SVG diagram)

RAG Pipeline

The RAG tool (api/src/rag/tools.py) implements a multi-strategy search pipeline:

Primary hybrid search — BM25 + vector (text-embedding-3-large) with domain-scoped OData filters
Broadened filter fallback — relaxes non-identity filters when primary returns too few results
Keyword-only rescue — drops vector search entirely for edge cases where embeddings miss

Additional pipeline features:

LLM query rewriting — rewrites conversational questions into keyword-rich search queries
Chunk merging — consecutive chunks from the same document are merged to give the LLM complete context
Score normalisation — normalises across BM25 and RRF score scales
Quality gate — post-response validation catches infrastructure errors, skipped searches, ignored results, and missing sources (api/src/rag/quality_gate.py)
Source recovery — extracts and deduplicates source references from raw agent output (api/src/agents/_output.py)
Proofreading pass — a fast Haiku model fixes generation artefacts before final delivery (api/src/agents/_proofread.py)

API Reference

Method	Endpoint	Description
`POST`	`/api/v1/chat`	Chat — returns JSON response
`POST`	`/api/v1/chat/stream`	Chat — Server-Sent Events with real-time streaming
`GET`	`/api/v1/chat/{conversation_id}`	Load conversation history
`DELETE`	`/api/v1/chat/{conversation_id}`	Delete a conversation
`POST`	`/api/v1/chat/{conversation_id}/feedback`	Record thumbs up/down + comment
`GET`	`/api/v1/agents`	List available agents (filtered by caller's auth level)
`POST`	`/api/v1/auth/guest`	Issue a guest access token
`GET`	`/api/v1/me`	User profile (JWT claims + Graph API enrichment)
`GET`	`/api/v1/me/photo`	User profile photo (via Graph API OBO)
`GET`	`/api/v1/conversations`	List conversations for the authenticated user
`GET`	`/api/v1/health`	Health check (supports `?deep=true` for component checks)
`GET`	`/api/v1/admin/`	Dev-only conversation browser dashboard

SSE Event Protocol

phase(thinking) → agent(name) → phase(generating) → delta* → phase(verifying) →
confidence → verification → usage → done → [DONE]

:keepalive comments every 5 seconds
phase(waiting) after 10 seconds of no output (e.g. during upstream 429 retry)
debug events with RAG search details (dev mode + X-Surf-Debug header)
error events with structured codes for client-side handling

PDF Attachments

The chat endpoint accepts PDF file attachments with tiered processing (api/src/orchestrator/pdf.py):

Tier 1 (direct vision): PDFs up to 30 pages are sent as native document content blocks
Tier 2 (text extraction): Larger PDFs get text extracted and sent as text blocks
Size limit: 100 MB with decompression bomb protection

Security Model

Surf implements defence-in-depth. The full model is documented in docs/security-model.md.

Layer	Mechanism	Location
Authentication	Entra ID (RS256 JWKS) + guest tokens (HS256 HMAC) + dev bypass	`api/src/middleware/auth.py`
Authorisation	3-tier AuthLevel enum; agent graphs filtered per auth level	`api/src/agents/_base.py`, `api/src/orchestrator/builder.py`
Rate limiting	Per-user limits on every endpoint (slowapi)	`api/src/middleware/rate_limit.py`
Input validation	Message length cap (10K chars), control character stripping, body size limits	`api/src/middleware/input_validation.py`, `api/src/middleware/body_limit.py`
Prompt injection	Domain-isolated RAG, structured JSON enforcement, quality gate, source-pollution guard	`api/src/rag/tools.py`, `api/src/services/streaming.py`
Production guards	App refuses to start with auth disabled, debug on, wildcard CORS, or no Postgres SSL	`api/src/main.py`
Data isolation	All queries scoped to `user_id`; CASCADE deletes; conversation TTL expiry	`api/src/services/conversation.py`
Secret management	Key Vault for runtime secrets; managed identity for Azure services; OIDC for CI/CD	`infra/main.bicep`

Security tests in api/tests/security/ cover JWT bypass attempts, input injection vectors, and conversation isolation.

Observability

Signal	Backend	Detail
Traces	OpenTelemetry → Azure Monitor or OTLP collector	Spans across routes, agent handoffs, RAG search, persistence
Metrics	OTel histograms + counters	Chat duration, token usage (in/out per agent), quality gate triggers, rate limit hits
LLM tracing	Langfuse v3	Per-call tracing with cost tracking; local dev stack included in `docker-compose.yml`
Dashboards	Application Insights workbook	Pre-built telemetry workbook in `infra/workbooks/api-telemetry.json`
Alerts	Azure metric alerts	Container restart, 5xx rate, CPU threshold (all in `infra/main.bicep`)

Telemetry configuration: api/src/middleware/telemetry.py. Langfuse integration: api/src/middleware/langfuse_utils.py.

Infrastructure

Surf's Azure infrastructure is defined in a single infra/main.bicep orchestrator (1,200+ lines) using Azure Verified Modules:

Resource	Module	Purpose
Log Analytics	`avm/operational-insights/workspace`	OpenTelemetry traces + structured logs
Application Insights	`modules/application-insights.bicep`	APM, telemetry workbook
Managed Identity	`avm/managed-identity`	App identity + CI identity (WIF)
Azure OpenAI	`avm/cognitive-services/account`	text-embedding-3-large (ingestion only)
Azure AI Search	`avm/search/search-service`	Hybrid BM25 + vector retrieval
Key Vault	`avm/key-vault/vault`	Secrets (API keys, client secrets, guest token HMAC)
VNet + NSGs	`avm/network/virtual-network`	Private networking with subnet isolation
Private DNS Zones	`avm/network/private-dns-zone`	DNS for Search, Storage, OpenAI private endpoints
Storage	`avm/storage/storage-account`	Document blob storage for ingestion
Container Registry	`avm/container-registry`	Container image hosting
Container Apps	Native Bicep resource	API (0-3 replicas), web (nginx), ingestion (0-1)
Metric Alerts	`avm/insights/metric-alert`	Restart, 5xx, and CPU alerts

Three environments: dev.bicepparam, staging.bicepparam, prod.bicepparam.

CI/CD

Both GitHub Actions and GitLab CI/CD pipelines are maintained:

Pipeline	GitHub Actions	GitLab CI	Trigger
API	`.github/workflows/api-ci.yml`	`.gitlab/ci/api-ci.yml`	Push to `main` (`api/**`)
Web	`.github/workflows/web-ci.yml`	`.gitlab/ci/web-ci.yml`	Push to `main` (`web/**`)
Ingestion	`.github/workflows/ingestion-ci.yml`	`.gitlab/ci/ingestion-ci.yml`	Push to `main` (`ingestion/**`)
Infra	`.github/workflows/infra-deploy.yml`	`.gitlab/ci/infra-deploy.yml`	Push to `main` (`infra/**`)
PR Checks	`.github/workflows/pr-checks.yml`	`.gitlab/ci/pr-checks.yml`	Pull/merge request

Key properties:

Zero stored secrets — GitHub uses OIDC federation; GitLab uses Workload Identity Federation via a dedicated CI managed identity provisioned in Bicep
Path-filtered — only relevant pipelines run per commit
Security scanning — Gitleaks secret scanning, pip-audit dependency auditing
Docker builds with BuildKit and multi-platform support

Ingestion Pipeline

The ingestion service (ingestion/) transforms raw documents into searchable index entries:

Stage	Description
Connectors	PDF (PyMuPDF), DOCX (python-docx), TXT, CSV parsers (`ingestion/src/connectors/`)
SharePoint sync	Graph API integration for syncing files and pages to blob storage
Chunking	Token-aware text splitting with tiktoken
Embedding	Azure OpenAI text-embedding-3-large via managed identity
Indexing	Azure AI Search with hybrid (BM25 + vector) index schema
Scheduling	Hourly indexer runs via Azure AI Search indexer pipeline

Testing

Suite	Location	What it covers
Unit	`api/tests/unit/`	28 modules — agents, routes, middleware, RAG tool, config, output parsing, telemetry, Langfuse
Security	`api/tests/security/`	JWT bypass, prompt injection, conversation isolation
Integration	`api/tests/integration/`	Multi-turn conversation flows against real Postgres
Eval	`api/tests/eval/`	LLM-judged response quality with dataset-driven parametrisation and weighted rubric scoring
Load	`api/tests/load/`	Locust load testing (`locustfile.py`)
Smoke	`web/playwright.config.ts`	Playwright browser smoke tests
Ingestion	`ingestion/tests/`	Connector and pipeline tests

Run with: just test (unit + security), just test-integration, just eval, just smoke.

Development

Command	Description
`just dev`	Run API with hot reload (port 8090) — auto-starts Postgres and runs migrations
`just devui`	Launch DevUI — interactive agent chat with tool call tracing (port 8091)
`just web`	Run web frontend (port 3000)
`just desktop`	Run Tauri desktop app
`just test`	Run unit + security tests
`just test-integration`	Run integration tests against real Postgres
`just eval`	Run LLM-judged eval suite
`just smoke`	Run Playwright smoke tests
`just lint`	Lint all Python code (ruff)
`just typecheck`	Type-check all Python code (pyright)
`just format`	Format all Python code
`just audit`	Run pip-audit security scanning
`just otel`	Start OpenTelemetry collector for local telemetry
`just langfuse`	Start local Langfuse trace viewer at http://localhost:3100
`just admin`	Open the dev admin dashboard
`just ask "question"`	Ask the dev agent about the codebase
`just ask-repl`	Start interactive dev agent session
`just setup-dev`	Deploy dev Azure resources + generate .env
`just teardown-dev`	Delete dev Azure resources
`just deploy`	Deploy API + web containers to Azure
`just deploy-all`	Deploy infrastructure + all containers

Links


Security Model	docs/security-model.md
Desktop App	docs/tauri-desktop-app.md
Load Testing	api/tests/load/README.md
Contributing	CONTRIBUTING.md
Code of Conduct	CODE_OF_CONDUCT.md
Security Policy	SECURITY.md

Tech Stack

Layer	Technology
API	Python 3.12, FastAPI 0.115+, Pydantic 2, agent-framework
LLM	Anthropic Claude (Haiku routing, Sonnet specialist) — direct API or Azure AI Foundry
RAG	Azure AI Search (hybrid BM25 + vector), Azure OpenAI text-embedding-3-large
Database	PostgreSQL 17 with Alembic migrations
Web	React 19, Vite 7, TailwindCSS 4, TypeScript strict
Desktop	Tauri 2 (Rust shell + shared web frontend)
Mobile	React Native + Expo 54, NativeWind
Shared UI	surf-kit — hooks, theme, icons, agent protocol
Auth	Microsoft Entra ID (JWKS) + HMAC guest tokens + MSAL
Observability	OpenTelemetry, Azure Monitor, Langfuse v3
Infra	Bicep (Azure Verified Modules), Container Apps, VNet, Key Vault
CI/CD	GitHub Actions + GitLab CI (OIDC / WIF, zero stored secrets)
Testing	pytest, Playwright, Locust, LLM eval judge
Quality	ruff (lint + format), pyright (strict types), pip-audit, Gitleaks

Apache-2.0

Version	Changes	Urgency	Date
0.0.0	No release found — using repo HEAD	High	4/8/2026
main@2026-04-08	Latest activity on main branch	High	4/8/2026
main@2026-04-08	Latest activity on main branch	High	4/8/2026
main@2026-04-08	Latest activity on main branch	High	4/8/2026
main@2026-04-08	Latest activity on main branch	High	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Medium	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026
main@2026-04-08	Latest activity on main branch	Low	4/8/2026

Description

README