Home > Infrastructure > voidllm

voidllm

Privacy-first LLM proxy and AI gateway — load balancing, multi-provider routing, API key management, usage tracking, rate limiting. Self-hosted. Zero knowledge of your prompts.

ai ai-gateway anthropic api-gateway docker go golang helm kubernetes

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

Privacy-first LLM proxy and AI gateway — load balancing, multi-provider routing, API key management, usage tracking, rate limiting. Self-hosted. Zero knowledge of your prompts.

README

VoidLLM

A privacy-first LLM proxy and AI gateway for teams that take control seriously.

VoidLLM is a self-hosted LLM proxy that sits between your applications and LLM providers - OpenAI, Anthropic, Azure, Ollama, vLLM, or any custom endpoint. It gives you organization-wide access control, API key management, usage tracking, rate limiting, and multi-deployment load balancing. One Go binary, sub-2ms proxy overhead, zero knowledge of your prompts.

More screenshots

Privacy-First by Design: VoidLLM is a zero-knowledge LLM proxy - it never stores, logs, or persists any prompt or response content. Not as a setting you can toggle - by architecture. Only metadata is tracked: who made the request, which model, how many tokens, how long it took. Your data stays yours.

Why VoidLLM?

Problem	How VoidLLM solves it
Teams share raw API keys in Slack	Virtual keys with org/team/user scoping and RBAC
No visibility into who's spending what	Per-key, per-team, per-org usage tracking + cost estimation
One runaway script burns the monthly budget	Rate limits + token budgets enforced by the proxy at every level
Switching providers means changing every app	Model aliases - clients call `default`, the proxy routes it anywhere
Provider goes down, everything breaks	Multi-deployment load balancing with automatic failover
Existing proxies log your prompts	Zero-knowledge proxy architecture - content never touches disk

Quick Start

# Generate required keys
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)

# Start the LLM proxy with Docker
docker run -p 8080:8080 \
  -e VOIDLLM_ADMIN_KEY -e VOIDLLM_ENCRYPTION_KEY \
  -v $(pwd)/voidllm.yaml:/etc/voidllm/voidllm.yaml:ro \
  -v voidllm_data:/data \
  ghcr.io/voidmind-io/voidllm:latest

Binary (no Docker needed)

Download the latest binary for your platform from the releases page:

# Linux
curl -sL https://github.com/voidmind-io/voidllm/releases/latest/download/voidllm-linux-amd64.tar.gz | tar xz
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
./voidllm

Available for: Linux (amd64, arm64), Windows (amd64, arm64), macOS (amd64, arm64).

On first start, VoidLLM prints your credentials to stdout:

========================================
 BOOTSTRAP COMPLETE - COPY THESE NOW
========================================
  API Key:    vl_uk_a3f2...
  Email:      admin@voidllm.local
  Password:   <random>
========================================

Open http://localhost:8080, log in with the email and password above, and start proxying. The API key is used for SDK calls (Authorization: Bearer vl_uk_...). These credentials are shown once - save them.

One-Click Deploy

Keys are auto-generated. Open the URL Railway gives you and start adding models.

# Your apps just point at the proxy instead of the provider
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer vl_uk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"hello"}]}'

Any OpenAI-compatible SDK works out of the box - just change the base URL to your VoidLLM proxy.

Features

Feature	Details
OpenAI-compatible proxy	`/v1/chat/completions`, embeddings, images, audio, streaming
Multi-provider routing	OpenAI, Anthropic, Azure, Ollama, vLLM, any custom endpoint
Load balancing	Round-robin, least-latency, weighted, priority across deployments
Automatic failover	Retry on 5xx/timeout, circuit breakers, health-aware routing
Web UI	Dashboard, playground, API keys, teams, models, usage, settings
RBAC	Org > Team > User > Key hierarchy, 4 roles
Rate limits	Requests per minute/day, most-restrictive-wins across levels
Token budgets	Daily/monthly limits, real-time enforcement
Usage tracking	Tokens, cost, duration, TTFT per request
Model aliases	Clients call `default`, you control where it routes
MCP gateway	Proxy external MCP servers with access control and session management
Code Mode	WASM-sandboxed JS for multi-tool orchestration
Prometheus metrics	Latency, tokens, active streams, routing, health
Database	SQLite (default) or PostgreSQL
Deployment	Docker, Helm chart, graceful shutdown

Pro ($49/mo)	Everything above, plus:
Cost reports	Model breakdown, daily trends
Usage export	CSV download
Data retention	Extended
Support	Priority email

Enterprise ($149/mo)	Everything in Pro, plus:
SSO / OIDC	Google, Azure AD, Okta, Keycloak, any provider
Per-org SSO	Each organization gets its own Identity Provider
Auto-provisioning	Users created from allowed email domains
Group sync	OIDC groups mapped to VoidLLM teams
Audit logs	Every admin action, filterable API + UI
OpenTelemetry	OTLP/gRPC export, request ID correlation
Support	Dedicated Slack

Founding Member ($999 one-time): All Enterprise features, lifetime license, Product Advisory Board, direct founder access. Limited spots.

Flat pricing - no per-user fees, no per-request charges. Self-hosted on your infrastructure.

MCP Gateway

VoidLLM is an MCP gateway - it exposes built-in management tools and proxies requests to external MCP servers with access control, usage tracking, and automatic session management.

Built-in Tools

Tool	Description
`list_models`	List models with health status (RBAC-scoped)
`get_model_health`	Health status for a specific model or deployment
`get_usage`	Usage stats for your key/team/org
`list_keys`	API keys visible to you
`create_key`	Create a temporary API key
`list_deployments`	Deployment details (system_admin only)

External MCP Servers

Register external MCP servers via the Admin UI or API. VoidLLM proxies tool calls through /api/v1/mcp/:alias with scoped access control (global, org, or team level), automatic session management, usage tracking, and Prometheus metrics.

Code Mode

Code Mode lets LLMs write JavaScript that orchestrates multiple MCP tool calls in a single execution - instead of one tool call per LLM turn. The JS runs in a WASM-sandboxed QuickJS runtime with no filesystem, no network, and no host access. Reduces token usage by 30-80%.

mcp:
  code_mode:
    enabled: true
    pool_size: 8          # concurrent WASM runtimes
    memory_limit_mb: 16   # per execution
    timeout: 30s          # per execution
    max_tool_calls: 50    # per execution

Code Mode exposes three tools on /api/v1/mcp:

Tool	Description
`list_servers`	Discover available MCP servers and tool counts
`search_tools`	Find tools by keyword across all servers
`execute_code`	Run JS with MCP tools as `await tools.alias.toolName(args)`

TypeScript type declarations are auto-generated from tool schemas and included in the execute_code description, so LLMs see available tools and argument types at tools/list time.

Admins can block specific tools from Code Mode via the per-tool blocklist API and UI.

IDE Setup

{
  "mcpServers": {
    "voidllm": {
      "type": "http",
      "url": "http://your-voidllm-instance:8080/api/v1/mcp",
      "headers": { "Authorization": "Bearer vl_uk_your_key" }
    }
  }
}

This connects your IDE (Claude Code, Cursor, Windsurf) to the Code Mode endpoint. Management tools (list_models, get_usage, etc.) are available at /api/v1/mcp/voidllm. External MCP servers at /api/v1/mcp/:alias.

Known Limitations

SSE transport not supported - MCP servers using the deprecated SSE protocol (pre 2025-03-26 spec) are auto-detected and deactivated. Use servers that support Streamable HTTP.
No OAuth for upstream MCP servers - servers requiring per-user OAuth (Jira, Slack, Google) are not yet supported. API key and header auth work.
Single instance only - Code Mode's WASM runtime pool is in-memory. Multi-pod deployments require Redis support (coming soon).

Documentation

Full documentation | Blog | FAQ

Topic	Guide
Getting Started	Quick Start
Configuration	All YAML settings
Docker	Docker deployment
Kubernetes	Helm chart
Providers	OpenAI, Anthropic, Azure, Ollama, vLLM
Load Balancing	Strategies, failover, circuit breakers
MCP Gateway	Overview - Servers - Code Mode - IDE Setup
RBAC	Roles and permissions
Privacy	Zero-knowledge architecture
API Reference	Endpoints and error codes
Enterprise	License - SSO - Audit - OTel - Pricing
Troubleshooting	Common issues

Configuration

server:
  proxy:
    port: 8080

models:
  # Single endpoint
  - name: dolphin-mistral
    provider: ollama
    base_url: http://localhost:11434/v1
    timeout: 30s
    aliases: [default]
    pricing:
      input_per_1m: 0.15
      output_per_1m: 0.60

  # Load balanced - multiple deployments with failover
  - name: gpt-4o
    strategy: round-robin
    aliases: [smart]
    deployments:
      - name: azure-east
        provider: azure
        base_url: https://eastus.openai.azure.com
        api_key: ${AZURE_EAST_KEY}
        azure_deployment: gpt-4o
        priority: 1
      - name: openai-fallback
        provider: openai
        base_url: https://api.openai.com/v1
        api_key: ${OPENAI_KEY}
        priority: 2

mcp_servers:
  - name: AWS Knowledge
    alias: aws
    url: https://knowledge-mcp.global.api.aws
    auth_type: none

settings:
  admin_key: ${VOIDLLM_ADMIN_KEY}
  encryption_key: ${VOIDLLM_ENCRYPTION_KEY}
  mcp:
    code_mode:
      enabled: true

Supported providers: openai · anthropic · azure · vllm · ollama · custom

Environment variables are interpolated with ${VAR} syntax. Secrets never hardcoded.

Deployment

Docker Compose

cp voidllm.yaml.example voidllm.yaml
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker-compose up

Kubernetes (Helm)

helm install voidllm chart/voidllm/ \
  --set secrets.adminKey=$(openssl rand -base64 32) \
  --set secrets.encryptionKey=$(openssl rand -base64 32) \
  --set config.models[0].name=my-model \
  --set config.models[0].provider=ollama \
  --set config.models[0].base_url=http://ollama:11434/v1

PostgreSQL and Redis are available as optional subcharts for production deployments.

From Source

# Prerequisites: Go 1.23+, Node 20+
cd ui && npm ci && npm run build && cd ..
go run ./cmd/voidllm --config voidllm.yaml

Privacy

This is not a feature toggle. It's an architectural decision that makes VoidLLM a privacy-first LLM proxy.

No request body in logs, DB, or any persistent storage
No response body in logs, DB, or any persistent storage
No prompt caching - content passes through memory only
Usage events contain only: who (key/org/team), what (model), how much (tokens/cost)
There is no enable_content_logging option. It doesn't exist.
Designed to support GDPR compliance - no personal data in prompts is stored or processed

CLI Tools

# Bidirectional database migration
voidllm migrate --from sqlite:///data/voidllm.db --to postgres://user:pass@host/db

# License management (for Enterprise)
voidllm license verify < license.jwt

License

Business Source License 1.1 - source available, self-hosting permitted, competing hosted services prohibited. Converts to Apache 2.0 four years after each release.

Built by VoidMind · voidllm.ai

This project was built with significant assistance from AI (Claude by Anthropic).

Release History

Version	Changes	Urgency	Date
v0.0.19	### Fixes - Admin TLS configuration (`server.admin.tls`) is now actually applied in dual-port mode. Previously `tls.enabled: true` was a silent no-op - the schema and validation existed but no listener consumed the cert/key. In single-port mode (admin sharing the proxy port) configuring TLS now emits a WARN since external termination is expected there. Thanks to @martinsotirov for the fix (#92) Full changelog: https://github.com/voidmind-io/voidllm/blob/main/CHANGELOG.md	High	5/20/2026
v0.0.18	### Fixes - Startup panic when Code Mode is disabled - default config caused a nil pointer dereference of `SchemaTTL` in `app.go`. Thanks to @kernelb00t for the report (#87) and @SAY-5 for the fix (#90) Full changelog: https://github.com/voidmind-io/voidllm/blob/main/CHANGELOG.md	High	5/13/2026
v0.0.17	### Features - Code Mode response unwrapping - MCP ToolResult wrapper is stripped before results reach JS, so scripts work with plain values not protocol envelopes (#73) - Code Mode output schema inference - return types are learned from the first successful tool call and persisted, surfacing as TypeScript via `search_tools` (#73) - Code Mode tool descriptions rewritten with STRONG PREFERENCE / WORKFLOW / PATTERNS guidance to push LLMs toward chained calls and reduce sequential round-trips (#73)	High	4/29/2026
v0.0.16	### Features - Model fallback chains - cross-model failover when all deployments of the primary are unavailable (Enterprise, #45) - Configurable chain depth via `settings.fallback_max_depth` - Per-hop access control enforcement - Cycle detection at config, API, and runtime - Usage events track both requested and served model name - UI: Fallback Model dropdown in model create and edit dialogs - UI: depth-0 warning when fallback is configured but disabled ### Fixes - Flaky MCP usage d	Medium	4/12/2026
voidllm-0.0.16	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/12/2026
v0.0.15	### Features - Configurable data retention for usage events and audit logs (#46) - Opt-in background cleanup job with per-table retention durations - Dialect-aware SQL for correct SQLite and PostgreSQL behavior - Batched deletes with single-column timestamp indexes - Admin UI update notification via GitHub release check - PostgreSQL migration locking via advisory lock prevents concurrent-migration races (#48) ### Improvements - Batch dependency updates: grpc 1.80.0, OpenTelemetry 1.43.0,	Medium	4/7/2026
voidllm-0.0.15	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/7/2026
v0.0.14	### Features - MCP OAuth Client Credentials auth type with token URL auto-discovery (#49) - Google Gemini and Vertex AI provider adapter (8 providers total) - MCP usage dashboard with tabbed layout - Overview, LLM, MCP (#44) - Binary deployment documentation for Linux, macOS, Windows ### Improvements - Shared credentials warning banner in MCP server dialogs - Windows binary pauses on error to show message before closing - 42 new tests for MCP usage, handlers, and health checker Full changelog:	Medium	4/4/2026
voidllm-0.0.14	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/4/2026
v0.0.13	### Features - MCP server health indicators in UI with auto-refresh (#43) - Standalone binary support for Windows, Linux, macOS (#50) - Cross-platform binaries in GitHub Release pipeline - License instance identification via heartbeat - Bench metrics sampler with realistic streaming scenario ### Improvements - Comprehensive logging review: audit coverage for MCP, SSO, license, settings - Key cache log noise reduced (INFO to DEBUG) - Rate limit and token budget violations now logged - Migration	Medium	4/4/2026
voidllm-0.0.13	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/4/2026
v0.0.12	### Fixes - Usage dashboard: handle NULL team_id/key_id/user_id in aggregation queries (#51) - License set via UI now persists to database across restarts - License startup log shows source (database, config, or none) - Heartbeat User-Agent includes VoidLLM version - Updated embedded license public key ### Documentation - README feature list as two-column table, removed em dashes - Corrected GDPR compliance language Full changelog: https://github.com/voidmind-io/voidllm/blob/main/CHANGELOG.md	Medium	4/2/2026
voidllm-0.0.12	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/2/2026
v0.0.11	### Documentation - Restructured docs into 24 files with subdirectories (deployment/, models/, mcp/, security/, enterprise/, api/) - Added getting-started guide, troubleshooting, and docs index - All doc files include Astro frontmatter for website rendering - Docs now live at [voidllm.ai/docs](https://voidllm.ai/docs) ### Helm Chart - Fixed Artifact Hub indexing (removed empty signKey annotation) ### CI - Pinned all GitHub Actions to commit hashes - Added Cosign image signing and SLSA provenan	Medium	4/2/2026
voidllm-0.0.11	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/2/2026
v0.0.10	### Helm Chart - Published to [Artifact Hub](https://artifacthub.io/packages/helm/voidllm/voidllm) - Chart README with quick start and configuration examples - Added icon, keywords, license annotation, documentation links ### Documentation - Bootstrap credentials clarified in README Quick Start - Blog link added to Documentation section - Artifact Hub badge in README ### Pricing - Pro: $49/mo (was $299) - Enterprise: $149/mo (was $799) - Founding Member: $999 one-time (lifetime enterprise, lim	Medium	4/1/2026
voidllm-0.0.10	Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.	Medium	4/1/2026
v0.0.9	### Docker, Helm & Configuration - Fixed image registry — Docker Compose now uses ghcr.io/voidmind-io/voidllm - Helm chart updated — correct registry, MCP, Code Mode, and health check settings - Istio support — optional Gateway + VirtualService templates - MCP servers in Helm — static MCP server definitions via config.mcpServers - Example config expanded — MCP, Code Mode, logging, health check, and enterprise sections	Medium	3/30/2026
v0.0.8	## Performance - sonic JSON engine — faster JSON serialization across all hot paths - In-memory caches — MCP server lookups, access checks, and transport pooling moved out of the DB hot path - MCP Proxy overhead reduced 36% — 670us to 427us P50 at 1000 RPS ## MCP Access Management - Closed-by-default for global servers — organizations must explicitly grant access to global MCP servers (org-scoped and team-scoped servers are unaffected) - MCP Access API — GET/PUT /orgs/:org	Medium	3/30/2026
v0.0.7	### Code Mode LLMs write JavaScript to orchestrate multiple MCP tool calls in a single WASM-sandboxed execution — reducing token usage by 30-80%. Inspired by [Cloudflare's Code Mode](https://blog.cloudflare.com/code-mode-mcp/), but fully self-hosted with a QuickJS/WASM sandbox (Wazero, pure Go, no CGO). New MCP tools on `/api/v1/mcp`: - `list_servers` — discover available MCP servers - `search_tools` — find tools by keyword across servers - `execute_code` — run JS with MCP tools as `await	Medium	3/29/2026
v0.0.6	## What's New ### MCP Gateway VoidLLM is now an MCP Gateway — register external MCP servers and proxy tool calls through VoidLLM with access control, session management, and usage tracking. - Proxy — `/api/v1/mcp/:alias` routes JSON-RPC to any registered MCP server - Session management — automatic initialize + `Mcp-Session-Id` forwarding, re-init on expiry - Scoped registration — global (system_admin), org (org_admin), team (team_admin) - Alias shadowing — team > org > global p	Medium	3/28/2026
v0.0.5	## What's New ### Multi-Deployment Load Balancing - Configure multiple deployments per model across providers and regions - 4 routing strategies: round-robin, least-latency, weighted, priority - Automatic failover: retries next deployment on 5xx, timeout, or connection error - Per-deployment circuit breakers with independent cooldown periods - Per-deployment health probing — unhealthy deployments skipped during routing - Community feature — no license required ### Load Balancing UI - Create Mo	Medium	3/26/2026
v0.0.4	## What's New ### Model Types - `model_type` field across the full stack — chat, embedding, reranking, completion, image, audio_transcription, tts - Type badge on Models page with color-coded variants - Type selector in Create and Edit Model dialogs - Health checker: type-aware functional probe (skips non-chat types) - `/me/available-models` returns `{name, type}` objects ### Playground Tabs - Type-based tabs (Chat / Embedding / Completion) — only shown when models of that type exist - Embeddi	Medium	3/24/2026
v0.0.3	## Complete UI Redesign Premium dark-theme overhaul of every page in the admin dashboard. ### New Chart Components - AreaChart — line/area with gradient fill (Recharts) - DonutChart — ring chart with center label (pure SVG) - HorizontalBar — horizontal progress bars - MiniTable — compact data table ### Visual Upgrades - Glassmorphism dialogs — backdrop-blur, semi-transparent, purple accent border - Segmented pill tabs — replaces underline tabs across all detail pages -	Medium	3/23/2026
v0.0.2	## What's Changed ### Security - Fix CVE in grpc-go (authorization bypass) — upgraded to v1.79.3 - Fix potential integer overflow in Anthropic adapter (CodeQL finding) - Add OpenSSF Scorecard + CodeQL workflows ### Bug Fixes - Fix user key creation without team membership - Fix all Dialog form submissions (Portal submit bug) - Fix UI feature gating — enterprise pages show UpgradePrompt when not licensed - Fix Sidebar locked items — clickable with upgrade prompt instead of dead links - Fix flak	Medium	3/23/2026
v0.0.1	## VoidLLM v0.0.1 The privacy-first LLM proxy for teams. First public release. ### Highlights - OpenAI-compatible proxy with provider adapters (Anthropic, Azure, Ollama, vLLM) - Org/team/user hierarchy with 4-role RBAC - API key management with HMAC-SHA256 hashing and key rotation - Usage tracking with cost reports, hourly rollups, and cross-org analytics - Rate limiting — in-memory (single instance) or distributed via Redis - Enterprise features — SSO/OIDC, audit logs	Medium	3/23/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

axonhub⚡️ Open-source AI Gateway — Use any SDK to call 100+ LLMs. Built-in failover, load balancing, cost control & end-to-end tracing.v1.0.0-beta2

goclawGoClaw - GoClaw is OpenClaw rebuilt in Go — with multi-tenant isolation, 5-layer security, and native concurrency. Deploy AI agent teams at scale without compromising on safety.v3.13.2

toolhiveToolHive is an enterprise-grade platform for running and managing Model Context Protocol (MCP) servers.v0.29.0

AgenvoyAgentic framework | Self-improving memory | Pluggable tool extensions | Sandbox executionv0.25.8

inference-gatewayAn open-source, cloud-native, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare anv0.24.6

More in Infrastructure

tensorzeroTensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

planoPlano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

modelsThis repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.

edgeeOpen-source AI gateway written in Rust, with token compression for Claude Code, Codex... and any other LLM client.