freshcrate
Home > Frameworks > surf

surf

The open framework for extensible & grounded AI agent orchestration.

Description

The open framework for extensible & grounded AI agent orchestration.

README

surf mascot

The open framework for AI agent orchestration.

Build multi-agent systems that route queries to specialist agents,
ground every answer in your own knowledge base, and ship across
web, desktop, and mobile from a single codebase.

CIPythonTypeScriptLicense Quickstart  •  How it works  •  Features  •  Agents  •  Deep Dive  •  Contributing


Quickstart

Prerequisites

Setup

az login
cd api && uv sync && cd ../ingestion && uv sync && cd ..
just setup-dev          # deploy dev Azure resources + generate .env
just dev                # start API with hot reload (auto-starts Postgres, runs migrations)

Verify

curl http://localhost:8090/api/v1/health

Note: RBAC role propagation can take a few minutes. If you get 403 errors, wait and retry.

Run with DevUI / Web / Desktop
just devui              # interactive agent chat with tool call visibility — port 8091
just web                # full SPA with auth, conversation history, debug panels — port 3000
just desktop            # Tauri desktop app with native window management

How it works

graph TD
  web["Web / Desktop / Mobile<br/>surf-kit + React"]
  nginx["nginx<br/>reverse proxy"]
  api["FastAPI API"]
  coordinator["Coordinator Agent<br/>claude-haiku-4-5"]
  hr["HR Agent<br/>claude-sonnet-4-6"]
  it["IT Agent<br/>claude-sonnet-4-6"]
  website["Website Agent<br/>claude-sonnet-4-6"]
  rag["Azure AI Search<br/>BM25 + Vector"]
  proofread["Proofreader<br/>claude-haiku-4-5"]
  qg["Quality Gate"]
  postgres["PostgreSQL<br/>conversations + feedback"]
  otel["OpenTelemetry<br/>Azure Monitor / OTLP"]
  langfuse["Langfuse<br/>LLM tracing"]
  keyvault["Key Vault"]

  web -->|SSE| nginx --> api
  api --> coordinator
  coordinator -->|handoff| hr
  coordinator -->|handoff| it
  coordinator -->|handoff| website
  hr --> rag
  it --> rag
  website --> rag
  hr --> qg --> proofread
  api --> postgres
  api --> otel
  api --> langfuse
  api --> keyvault
Loading

What you get

Zero-registration agents Subclass DomainAgent and the framework discovers, registers, and wires it automatically. No config files.
Auth-filtered routing Agents are invisible to users who lack the required auth level. The coordinator can't even describe them.
3-strategy RAG Hybrid search with broadened-filter fallback, keyword-only rescue, and post-response quality gates.
Prompt injection defence Four independent layers — domain-isolated RAG, structured JSON, quality gate, source-pollution guard.
Multi-model routing Haiku for fast coordinator decisions, Sonnet for specialist agents. Direct Anthropic or Azure AI Foundry.
Ship everywhere Web, desktop, and mobile from one React codebase via the shared surf-kit component library.

Agents

Agent Purpose RAG Scope Model Auth Level
Coordinator Routes queries, synthesises multi-domain answers Unscoped Haiku (fast) Public
HR Leave, onboarding, performance, L&D policies domain=hr Sonnet Microsoft Account
IT VPN, passwords, software, hardware, security domain=it Sonnet Organisational
Website Public-facing content, services, events content_source=website Sonnet Public
Adding a new agent
# api/src/agents/finance/agent.py
class FinanceAgent(DomainAgent):
    @property
    def name(self) -> str:
        return "finance_agent"

    @property
    def description(self) -> str:
        return "Handles budget and procurement queries"

    @property
    def rag_scope(self) -> RAGScope:
        return RAGScope(domain="finance", document_types=["policy", "procedure"])

    @property
    def system_prompt(self) -> str:
        return "You are a finance specialist..."

That's it. No registration, no config changes. The framework discovers the subclass at startup, creates its RAG tool with domain-isolated filters, and adds it to the coordinator's handoff graph. See api/src/agents/_base.py for the full interface and api/src/agents/_discovery.py for the discovery mechanism.


Deep Dive

Project Structure
surf/
  api/                  FastAPI backend — agents, orchestrator, RAG, middleware
    src/
      agents/           Domain agents + coordinator (auto-discovered)
      orchestrator/     Workflow builder, PDF processing, middleware pipeline
      rag/              Search execution, 3-strategy tool, quality gate
      routes/           Chat, auth, user profile, admin, agent listing
      services/         Conversation persistence, Graph API, streaming, response pipeline
      middleware/       Auth, rate limiting, body limits, telemetry, input validation
      config/           Settings with environment-aware validation
    tests/
      unit/             28 test modules (~7K lines)
      security/         JWT bypass, prompt injection, conversation isolation
      integration/      Multi-turn flows against real Postgres
      eval/             LLM-judged response quality suite
      load/             Locust load testing
  web/                  React 19 + Vite 7 + TailwindCSS 4 frontend
    src-tauri/          Tauri desktop app (Rust shell)
  mobile/               React Native + Expo (iOS / Android)
  ingestion/            Document pipeline — PDF, DOCX, TXT, CSV connectors
  infra/                Azure IaC — 19 Bicep modules, 1,200+ lines
    modules/            Application Insights custom module
    environments/       dev / staging / prod parameter files
    workbooks/          Azure Monitor telemetry workbook
  data/                 Sample documents and ingestion manifests
Architecture (SVG diagram)
Surf Architecture Overview
RAG Pipeline

The RAG tool (api/src/rag/tools.py) implements a multi-strategy search pipeline:

  1. Primary hybrid search — BM25 + vector (text-embedding-3-large) with domain-scoped OData filters
  2. Broadened filter fallback — relaxes non-identity filters when primary returns too few results
  3. Keyword-only rescue — drops vector search entirely for edge cases where embeddings miss

Additional pipeline features:

  • LLM query rewriting — rewrites conversational questions into keyword-rich search queries
  • Chunk merging — consecutive chunks from the same document are merged to give the LLM complete context
  • Score normalisation — normalises across BM25 and RRF score scales
  • Quality gate — post-response validation catches infrastructure errors, skipped searches, ignored results, and missing sources (api/src/rag/quality_gate.py)
  • Source recovery — extracts and deduplicates source references from raw agent output (api/src/agents/_output.py)
  • Proofreading pass — a fast Haiku model fixes generation artefacts before final delivery (api/src/agents/_proofread.py)
API Reference
Method Endpoint Description
POST /api/v1/chat Chat — returns JSON response
POST /api/v1/chat/stream Chat — Server-Sent Events with real-time streaming
GET /api/v1/chat/{conversation_id} Load conversation history
DELETE /api/v1/chat/{conversation_id} Delete a conversation
POST /api/v1/chat/{conversation_id}/feedback Record thumbs up/down + comment
GET /api/v1/agents List available agents (filtered by caller's auth level)
POST /api/v1/auth/guest Issue a guest access token
GET /api/v1/me User profile (JWT claims + Graph API enrichment)
GET /api/v1/me/photo User profile photo (via Graph API OBO)
GET /api/v1/conversations List conversations for the authenticated user
GET /api/v1/health Health check (supports ?deep=true for component checks)
GET /api/v1/admin/ Dev-only conversation browser dashboard

SSE Event Protocol

phase(thinking) → agent(name) → phase(generating) → delta* → phase(verifying) →
confidence → verification → usage → done → [DONE]
  • :keepalive comments every 5 seconds
  • phase(waiting) after 10 seconds of no output (e.g. during upstream 429 retry)
  • debug events with RAG search details (dev mode + X-Surf-Debug header)
  • error events with structured codes for client-side handling

PDF Attachments

The chat endpoint accepts PDF file attachments with tiered processing (api/src/orchestrator/pdf.py):

  • Tier 1 (direct vision): PDFs up to 30 pages are sent as native document content blocks
  • Tier 2 (text extraction): Larger PDFs get text extracted and sent as text blocks
  • Size limit: 100 MB with decompression bomb protection
Security Model

Surf implements defence-in-depth. The full model is documented in docs/security-model.md.

Layer Mechanism Location
Authentication Entra ID (RS256 JWKS) + guest tokens (HS256 HMAC) + dev bypass api/src/middleware/auth.py
Authorisation 3-tier AuthLevel enum; agent graphs filtered per auth level api/src/agents/_base.py, api/src/orchestrator/builder.py
Rate limiting Per-user limits on every endpoint (slowapi) api/src/middleware/rate_limit.py
Input validation Message length cap (10K chars), control character stripping, body size limits api/src/middleware/input_validation.py, api/src/middleware/body_limit.py
Prompt injection Domain-isolated RAG, structured JSON enforcement, quality gate, source-pollution guard api/src/rag/tools.py, api/src/services/streaming.py
Production guards App refuses to start with auth disabled, debug on, wildcard CORS, or no Postgres SSL api/src/main.py
Data isolation All queries scoped to user_id; CASCADE deletes; conversation TTL expiry api/src/services/conversation.py
Secret management Key Vault for runtime secrets; managed identity for Azure services; OIDC for CI/CD infra/main.bicep

Security tests in api/tests/security/ cover JWT bypass attempts, input injection vectors, and conversation isolation.

Observability
Signal Backend Detail
Traces OpenTelemetry → Azure Monitor or OTLP collector Spans across routes, agent handoffs, RAG search, persistence
Metrics OTel histograms + counters Chat duration, token usage (in/out per agent), quality gate triggers, rate limit hits
LLM tracing Langfuse v3 Per-call tracing with cost tracking; local dev stack included in docker-compose.yml
Dashboards Application Insights workbook Pre-built telemetry workbook in infra/workbooks/api-telemetry.json
Alerts Azure metric alerts Container restart, 5xx rate, CPU threshold (all in infra/main.bicep)

Telemetry configuration: api/src/middleware/telemetry.py. Langfuse integration: api/src/middleware/langfuse_utils.py.

Infrastructure

Surf's Azure infrastructure is defined in a single infra/main.bicep orchestrator (1,200+ lines) using Azure Verified Modules:

Resource Module Purpose
Log Analytics avm/operational-insights/workspace OpenTelemetry traces + structured logs
Application Insights modules/application-insights.bicep APM, telemetry workbook
Managed Identity avm/managed-identity App identity + CI identity (WIF)
Azure OpenAI avm/cognitive-services/account text-embedding-3-large (ingestion only)
Azure AI Search avm/search/search-service Hybrid BM25 + vector retrieval
Key Vault avm/key-vault/vault Secrets (API keys, client secrets, guest token HMAC)
VNet + NSGs avm/network/virtual-network Private networking with subnet isolation
Private DNS Zones avm/network/private-dns-zone DNS for Search, Storage, OpenAI private endpoints
Storage avm/storage/storage-account Document blob storage for ingestion
Container Registry avm/container-registry Container image hosting
Container Apps Native Bicep resource API (0-3 replicas), web (nginx), ingestion (0-1)
Metric Alerts avm/insights/metric-alert Restart, 5xx, and CPU alerts

Three environments: dev.bicepparam, staging.bicepparam, prod.bicepparam.

CI/CD

Both GitHub Actions and GitLab CI/CD pipelines are maintained:

Pipeline GitHub Actions GitLab CI Trigger
API .github/workflows/api-ci.yml .gitlab/ci/api-ci.yml Push to main (api/**)
Web .github/workflows/web-ci.yml .gitlab/ci/web-ci.yml Push to main (web/**)
Ingestion .github/workflows/ingestion-ci.yml .gitlab/ci/ingestion-ci.yml Push to main (ingestion/**)
Infra .github/workflows/infra-deploy.yml .gitlab/ci/infra-deploy.yml Push to main (infra/**)
PR Checks .github/workflows/pr-checks.yml .gitlab/ci/pr-checks.yml Pull/merge request

Key properties:

  • Zero stored secrets — GitHub uses OIDC federation; GitLab uses Workload Identity Federation via a dedicated CI managed identity provisioned in Bicep
  • Path-filtered — only relevant pipelines run per commit
  • Security scanning — Gitleaks secret scanning, pip-audit dependency auditing
  • Docker builds with BuildKit and multi-platform support
Ingestion Pipeline

The ingestion service (ingestion/) transforms raw documents into searchable index entries:

Stage Description
Connectors PDF (PyMuPDF), DOCX (python-docx), TXT, CSV parsers (ingestion/src/connectors/)
SharePoint sync Graph API integration for syncing files and pages to blob storage
Chunking Token-aware text splitting with tiktoken
Embedding Azure OpenAI text-embedding-3-large via managed identity
Indexing Azure AI Search with hybrid (BM25 + vector) index schema
Scheduling Hourly indexer runs via Azure AI Search indexer pipeline
Testing
Suite Location What it covers
Unit api/tests/unit/ 28 modules — agents, routes, middleware, RAG tool, config, output parsing, telemetry, Langfuse
Security api/tests/security/ JWT bypass, prompt injection, conversation isolation
Integration api/tests/integration/ Multi-turn conversation flows against real Postgres
Eval api/tests/eval/ LLM-judged response quality with dataset-driven parametrisation and weighted rubric scoring
Load api/tests/load/ Locust load testing (locustfile.py)
Smoke web/playwright.config.ts Playwright browser smoke tests
Ingestion ingestion/tests/ Connector and pipeline tests

Run with: just test (unit + security), just test-integration, just eval, just smoke.


Development

Command Description
just dev Run API with hot reload (port 8090) — auto-starts Postgres and runs migrations
just devui Launch DevUI — interactive agent chat with tool call tracing (port 8091)
just web Run web frontend (port 3000)
just desktop Run Tauri desktop app
just test Run unit + security tests
just test-integration Run integration tests against real Postgres
just eval Run LLM-judged eval suite
just smoke Run Playwright smoke tests
just lint Lint all Python code (ruff)
just typecheck Type-check all Python code (pyright)
just format Format all Python code
just audit Run pip-audit security scanning
just otel Start OpenTelemetry collector for local telemetry
just langfuse Start local Langfuse trace viewer at http://localhost:3100
just admin Open the dev admin dashboard
just ask "question" Ask the dev agent about the codebase
just ask-repl Start interactive dev agent session
just setup-dev Deploy dev Azure resources + generate .env
just teardown-dev Delete dev Azure resources
just deploy Deploy API + web containers to Azure
just deploy-all Deploy infrastructure + all containers

Links

Security Model docs/security-model.md
Desktop App docs/tauri-desktop-app.md
Load Testing api/tests/load/README.md
Contributing CONTRIBUTING.md
Code of Conduct CODE_OF_CONDUCT.md
Security Policy SECURITY.md
Tech Stack
Layer Technology
API Python 3.12, FastAPI 0.115+, Pydantic 2, agent-framework
LLM Anthropic Claude (Haiku routing, Sonnet specialist) — direct API or Azure AI Foundry
RAG Azure AI Search (hybrid BM25 + vector), Azure OpenAI text-embedding-3-large
Database PostgreSQL 17 with Alembic migrations
Web React 19, Vite 7, TailwindCSS 4, TypeScript strict
Desktop Tauri 2 (Rust shell + shared web frontend)
Mobile React Native + Expo 54, NativeWind
Shared UI surf-kit — hooks, theme, icons, agent protocol
Auth Microsoft Entra ID (JWKS) + HMAC guest tokens + MSAL
Observability OpenTelemetry, Azure Monitor, Langfuse v3
Infra Bicep (Azure Verified Modules), Container Apps, VNet, Key Vault
CI/CD GitHub Actions + GitLab CI (OIDC / WIF, zero stored secrets)
Testing pytest, Playwright, Locust, LLM eval judge
Quality ruff (lint + format), pyright (strict types), pip-audit, Gitleaks

Apache-2.0

Release History

VersionChangesUrgencyDate
0.0.0No release found — using repo HEADHigh4/8/2026
main@2026-04-08Latest activity on main branchHigh4/8/2026
main@2026-04-08Latest activity on main branchHigh4/8/2026
main@2026-04-08Latest activity on main branchHigh4/8/2026
main@2026-04-08Latest activity on main branchHigh4/8/2026
main@2026-04-08Latest activity on main branchMedium4/8/2026
main@2026-04-08Latest activity on main branchMedium4/8/2026
main@2026-04-08Latest activity on main branchMedium4/8/2026
main@2026-04-08Latest activity on main branchMedium4/8/2026
main@2026-04-08Latest activity on main branchMedium4/8/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

Wee-Orchestrator🍀 Self-hosted multi-agent AI orchestrator — chat with Claude, Gemini & Copilot CLI from Telegram, WebEx, or browser. 5 runtimes, 17+ models, task scheduling, skill plugins.main@2026-04-21
agent-frameworkA framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.python-1.1.0
ryeosA data-driven, cryptographically signed, registry-backed AI operating system, with capability-scoped execution and graph-executable workflows — living inside your projects, running through a recursivemain@2026-04-20
intentkitIntentKit is an open-source, self-hosted cloud agent cluster that manages a collaborative team of AI agents for you.v0.17.60
heartbeat-agent-frameworkThe open-source framework that makes AI agents proactive, self-learning, and autonomous. Multi-project tracking, full logging pipeline, message discipline, and memory review system.0.0.0