freshcrate
Home > Infrastructure > voidllm

voidllm

Privacy-first LLM proxy and AI gateway โ€” load balancing, multi-provider routing, API key management, usage tracking, rate limiting. Self-hosted. Zero knowledge of your prompts.

Description

Privacy-first LLM proxy and AI gateway โ€” load balancing, multi-provider routing, API key management, usage tracking, rate limiting. Self-hosted. Zero knowledge of your prompts.

README

VoidLLM

CI codecov Go Report Card Artifact Hub OpenSSF Scorecard Snyk Release Go License: BSL 1.1

A privacy-first LLM proxy and AI gateway for teams that take control seriously.

VoidLLM is a self-hosted LLM proxy that sits between your applications and LLM providers - OpenAI, Anthropic, Azure, Ollama, vLLM, or any custom endpoint. It gives you organization-wide access control, API key management, usage tracking, rate limiting, and multi-deployment load balancing. One Go binary, sub-2ms proxy overhead, zero knowledge of your prompts.

VoidLLM Dashboard

More screenshots

Usage Analytics API Keys Playground

Privacy-First by Design: VoidLLM is a zero-knowledge LLM proxy - it never stores, logs, or persists any prompt or response content. Not as a setting you can toggle - by architecture. Only metadata is tracked: who made the request, which model, how many tokens, how long it took. Your data stays yours.


Why VoidLLM?

Problem How VoidLLM solves it
Teams share raw API keys in Slack Virtual keys with org/team/user scoping and RBAC
No visibility into who's spending what Per-key, per-team, per-org usage tracking + cost estimation
One runaway script burns the monthly budget Rate limits + token budgets enforced by the proxy at every level
Switching providers means changing every app Model aliases - clients call default, the proxy routes it anywhere
Provider goes down, everything breaks Multi-deployment load balancing with automatic failover
Existing proxies log your prompts Zero-knowledge proxy architecture - content never touches disk

Quick Start

# Generate required keys
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)

# Start the LLM proxy with Docker
docker run -p 8080:8080 \
  -e VOIDLLM_ADMIN_KEY -e VOIDLLM_ENCRYPTION_KEY \
  -v $(pwd)/voidllm.yaml:/etc/voidllm/voidllm.yaml:ro \
  -v voidllm_data:/data \
  ghcr.io/voidmind-io/voidllm:latest

Binary (no Docker needed)

Download the latest binary for your platform from the releases page:

# Linux
curl -sL https://github.com/voidmind-io/voidllm/releases/latest/download/voidllm-linux-amd64.tar.gz | tar xz
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
./voidllm

Available for: Linux (amd64, arm64), Windows (amd64, arm64), macOS (amd64, arm64).

On first start, VoidLLM prints your credentials to stdout:

========================================
 BOOTSTRAP COMPLETE - COPY THESE NOW
========================================
  API Key:    vl_uk_a3f2...
  Email:      admin@voidllm.local
  Password:   <random>
========================================

Open http://localhost:8080, log in with the email and password above, and start proxying. The API key is used for SDK calls (Authorization: Bearer vl_uk_...). These credentials are shown once - save them.

One-Click Deploy

Deploy on Railway

Keys are auto-generated. Open the URL Railway gives you and start adding models.

# Your apps just point at the proxy instead of the provider
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer vl_uk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"hello"}]}'

Any OpenAI-compatible SDK works out of the box - just change the base URL to your VoidLLM proxy.

Features

Feature Details
OpenAI-compatible proxy /v1/chat/completions, embeddings, images, audio, streaming
Multi-provider routing OpenAI, Anthropic, Azure, Ollama, vLLM, any custom endpoint
Load balancing Round-robin, least-latency, weighted, priority across deployments
Automatic failover Retry on 5xx/timeout, circuit breakers, health-aware routing
Web UI Dashboard, playground, API keys, teams, models, usage, settings
RBAC Org > Team > User > Key hierarchy, 4 roles
Rate limits Requests per minute/day, most-restrictive-wins across levels
Token budgets Daily/monthly limits, real-time enforcement
Usage tracking Tokens, cost, duration, TTFT per request
Model aliases Clients call default, you control where it routes
MCP gateway Proxy external MCP servers with access control and session management
Code Mode WASM-sandboxed JS for multi-tool orchestration
Prometheus metrics Latency, tokens, active streams, routing, health
Database SQLite (default) or PostgreSQL
Deployment Docker, Helm chart, graceful shutdown
Pro ($49/mo) Everything above, plus:
Cost reports Model breakdown, daily trends
Usage export CSV download
Data retention Extended
Support Priority email
Enterprise ($149/mo) Everything in Pro, plus:
SSO / OIDC Google, Azure AD, Okta, Keycloak, any provider
Per-org SSO Each organization gets its own Identity Provider
Auto-provisioning Users created from allowed email domains
Group sync OIDC groups mapped to VoidLLM teams
Audit logs Every admin action, filterable API + UI
OpenTelemetry OTLP/gRPC export, request ID correlation
Support Dedicated Slack

Founding Member ($999 one-time): All Enterprise features, lifetime license, Product Advisory Board, direct founder access. Limited spots.

Flat pricing - no per-user fees, no per-request charges. Self-hosted on your infrastructure.


MCP Gateway

VoidLLM is an MCP gateway - it exposes built-in management tools and proxies requests to external MCP servers with access control, usage tracking, and automatic session management.

Built-in Tools

Tool Description
list_models List models with health status (RBAC-scoped)
get_model_health Health status for a specific model or deployment
get_usage Usage stats for your key/team/org
list_keys API keys visible to you
create_key Create a temporary API key
list_deployments Deployment details (system_admin only)

External MCP Servers

Register external MCP servers via the Admin UI or API. VoidLLM proxies tool calls through /api/v1/mcp/:alias with scoped access control (global, org, or team level), automatic session management, usage tracking, and Prometheus metrics.

Code Mode

Code Mode lets LLMs write JavaScript that orchestrates multiple MCP tool calls in a single execution - instead of one tool call per LLM turn. The JS runs in a WASM-sandboxed QuickJS runtime with no filesystem, no network, and no host access. Reduces token usage by 30-80%.

mcp:
  code_mode:
    enabled: true
    pool_size: 8          # concurrent WASM runtimes
    memory_limit_mb: 16   # per execution
    timeout: 30s          # per execution
    max_tool_calls: 50    # per execution

Code Mode exposes three tools on /api/v1/mcp:

Tool Description
list_servers Discover available MCP servers and tool counts
search_tools Find tools by keyword across all servers
execute_code Run JS with MCP tools as await tools.alias.toolName(args)

TypeScript type declarations are auto-generated from tool schemas and included in the execute_code description, so LLMs see available tools and argument types at tools/list time.

Admins can block specific tools from Code Mode via the per-tool blocklist API and UI.

IDE Setup

{
  "mcpServers": {
    "voidllm": {
      "type": "http",
      "url": "http://your-voidllm-instance:8080/api/v1/mcp",
      "headers": { "Authorization": "Bearer vl_uk_your_key" }
    }
  }
}

This connects your IDE (Claude Code, Cursor, Windsurf) to the Code Mode endpoint. Management tools (list_models, get_usage, etc.) are available at /api/v1/mcp/voidllm. External MCP servers at /api/v1/mcp/:alias.

Known Limitations

  • SSE transport not supported - MCP servers using the deprecated SSE protocol (pre 2025-03-26 spec) are auto-detected and deactivated. Use servers that support Streamable HTTP.
  • No OAuth for upstream MCP servers - servers requiring per-user OAuth (Jira, Slack, Google) are not yet supported. API key and header auth work.
  • Single instance only - Code Mode's WASM runtime pool is in-memory. Multi-pod deployments require Redis support (coming soon).

Documentation

Full documentation | Blog | FAQ

Topic Guide
Getting Started Quick Start
Configuration All YAML settings
Docker Docker deployment
Kubernetes Helm chart
Providers OpenAI, Anthropic, Azure, Ollama, vLLM
Load Balancing Strategies, failover, circuit breakers
MCP Gateway Overview - Servers - Code Mode - IDE Setup
RBAC Roles and permissions
Privacy Zero-knowledge architecture
API Reference Endpoints and error codes
Enterprise License - SSO - Audit - OTel - Pricing
Troubleshooting Common issues

Configuration

server:
  proxy:
    port: 8080

models:
  # Single endpoint
  - name: dolphin-mistral
    provider: ollama
    base_url: http://localhost:11434/v1
    timeout: 30s
    aliases: [default]
    pricing:
      input_per_1m: 0.15
      output_per_1m: 0.60

  # Load balanced - multiple deployments with failover
  - name: gpt-4o
    strategy: round-robin
    aliases: [smart]
    deployments:
      - name: azure-east
        provider: azure
        base_url: https://eastus.openai.azure.com
        api_key: ${AZURE_EAST_KEY}
        azure_deployment: gpt-4o
        priority: 1
      - name: openai-fallback
        provider: openai
        base_url: https://api.openai.com/v1
        api_key: ${OPENAI_KEY}
        priority: 2

mcp_servers:
  - name: AWS Knowledge
    alias: aws
    url: https://knowledge-mcp.global.api.aws
    auth_type: none

settings:
  admin_key: ${VOIDLLM_ADMIN_KEY}
  encryption_key: ${VOIDLLM_ENCRYPTION_KEY}
  mcp:
    code_mode:
      enabled: true

Supported providers: openai ยท anthropic ยท azure ยท vllm ยท ollama ยท custom

Environment variables are interpolated with ${VAR} syntax. Secrets never hardcoded.

Deployment

Docker Compose

cp voidllm.yaml.example voidllm.yaml
export VOIDLLM_ADMIN_KEY=$(openssl rand -base64 32)
export VOIDLLM_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker-compose up

Kubernetes (Helm)

helm install voidllm chart/voidllm/ \
  --set secrets.adminKey=$(openssl rand -base64 32) \
  --set secrets.encryptionKey=$(openssl rand -base64 32) \
  --set config.models[0].name=my-model \
  --set config.models[0].provider=ollama \
  --set config.models[0].base_url=http://ollama:11434/v1

PostgreSQL and Redis are available as optional subcharts for production deployments.

From Source

# Prerequisites: Go 1.23+, Node 20+
cd ui && npm ci && npm run build && cd ..
go run ./cmd/voidllm --config voidllm.yaml

Privacy

This is not a feature toggle. It's an architectural decision that makes VoidLLM a privacy-first LLM proxy.

  • No request body in logs, DB, or any persistent storage
  • No response body in logs, DB, or any persistent storage
  • No prompt caching - content passes through memory only
  • Usage events contain only: who (key/org/team), what (model), how much (tokens/cost)
  • There is no enable_content_logging option. It doesn't exist.
  • Designed to support GDPR compliance - no personal data in prompts is stored or processed

CLI Tools

# Bidirectional database migration
voidllm migrate --from sqlite:///data/voidllm.db --to postgres://user:pass@host/db

# License management (for Enterprise)
voidllm license verify < license.jwt

License

Business Source License 1.1 - source available, self-hosting permitted, competing hosted services prohibited. Converts to Apache 2.0 four years after each release.


Built by VoidMind ยท voidllm.ai

This project was built with significant assistance from AI (Claude by Anthropic).

Release History

VersionChangesUrgencyDate
v0.0.16### Features - Model fallback chains - cross-model failover when all deployments of the primary are unavailable (Enterprise, #45) - Configurable chain depth via `settings.fallback_max_depth` - Per-hop access control enforcement - Cycle detection at config, API, and runtime - Usage events track both requested and served model name - UI: Fallback Model dropdown in model create and edit dialogs - UI: depth-0 warning when fallback is configured but disabled ### Fixes - Flaky MCP usage dMedium4/12/2026
voidllm-0.0.16Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/12/2026
v0.0.15### Features - Configurable data retention for usage events and audit logs (#46) - Opt-in background cleanup job with per-table retention durations - Dialect-aware SQL for correct SQLite and PostgreSQL behavior - Batched deletes with single-column timestamp indexes - Admin UI update notification via GitHub release check - PostgreSQL migration locking via advisory lock prevents concurrent-migration races (#48) ### Improvements - Batch dependency updates: grpc 1.80.0, OpenTelemetry 1.43.0, Medium4/7/2026
voidllm-0.0.15Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/7/2026
v0.0.14### Features - MCP OAuth Client Credentials auth type with token URL auto-discovery (#49) - Google Gemini and Vertex AI provider adapter (8 providers total) - MCP usage dashboard with tabbed layout - Overview, LLM, MCP (#44) - Binary deployment documentation for Linux, macOS, Windows ### Improvements - Shared credentials warning banner in MCP server dialogs - Windows binary pauses on error to show message before closing - 42 new tests for MCP usage, handlers, and health checker Full changelog:Medium4/4/2026
voidllm-0.0.14Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/4/2026
v0.0.13### Features - MCP server health indicators in UI with auto-refresh (#43) - Standalone binary support for Windows, Linux, macOS (#50) - Cross-platform binaries in GitHub Release pipeline - License instance identification via heartbeat - Bench metrics sampler with realistic streaming scenario ### Improvements - Comprehensive logging review: audit coverage for MCP, SSO, license, settings - Key cache log noise reduced (INFO to DEBUG) - Rate limit and token budget violations now logged - Migration Medium4/4/2026
voidllm-0.0.13Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/4/2026
v0.0.12### Fixes - Usage dashboard: handle NULL team_id/key_id/user_id in aggregation queries (#51) - License set via UI now persists to database across restarts - License startup log shows source (database, config, or none) - Heartbeat User-Agent includes VoidLLM version - Updated embedded license public key ### Documentation - README feature list as two-column table, removed em dashes - Corrected GDPR compliance language Full changelog: https://github.com/voidmind-io/voidllm/blob/main/CHANGELOG.mdMedium4/2/2026
voidllm-0.0.12Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/2/2026
v0.0.11### Documentation - Restructured docs into 24 files with subdirectories (deployment/, models/, mcp/, security/, enterprise/, api/) - Added getting-started guide, troubleshooting, and docs index - All doc files include Astro frontmatter for website rendering - Docs now live at [voidllm.ai/docs](https://voidllm.ai/docs) ### Helm Chart - Fixed Artifact Hub indexing (removed empty signKey annotation) ### CI - Pinned all GitHub Actions to commit hashes - Added Cosign image signing and SLSA provenanMedium4/2/2026
voidllm-0.0.11Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/2/2026
v0.0.10### Helm Chart - Published to [Artifact Hub](https://artifacthub.io/packages/helm/voidllm/voidllm) - Chart README with quick start and configuration examples - Added icon, keywords, license annotation, documentation links ### Documentation - Bootstrap credentials clarified in README Quick Start - Blog link added to Documentation section - Artifact Hub badge in README ### Pricing - Pro: $49/mo (was $299) - Enterprise: $149/mo (was $799) - Founding Member: $999 one-time (lifetime enterprise, limMedium4/1/2026
voidllm-0.0.10Privacy-first LLM proxy and AI gateway with load balancing, RBAC, MCP gateway, and built-in admin UI. Self-hosted, single binary, sub-500us overhead.Medium4/1/2026
v0.0.9### Docker, Helm & Configuration - Fixed image registry โ€” Docker Compose now uses ghcr.io/voidmind-io/voidllm - Helm chart updated โ€” correct registry, MCP, Code Mode, and health check settings - Istio support โ€” optional Gateway + VirtualService templates - MCP servers in Helm โ€” static MCP server definitions via config.mcpServers - Example config expanded โ€” MCP, Code Mode, logging, health check, and enterprise sectionsMedium3/30/2026
v0.0.8## Performance - **sonic JSON engine** โ€” faster JSON serialization across all hot paths - **In-memory caches** โ€” MCP server lookups, access checks, and transport pooling moved out of the DB hot path - **MCP Proxy overhead reduced 36%** โ€” 670us to 427us P50 at 1000 RPS ## MCP Access Management - **Closed-by-default for global servers** โ€” organizations must explicitly grant access to global MCP servers (org-scoped and team-scoped servers are unaffected) - **MCP Access API** โ€” GET/PUT /orgs/:orgMedium3/30/2026
v0.0.7### Code Mode LLMs write JavaScript to orchestrate multiple MCP tool calls in a single WASM-sandboxed execution โ€” reducing token usage by 30-80%. Inspired by [Cloudflare's Code Mode](https://blog.cloudflare.com/code-mode-mcp/), but fully self-hosted with a QuickJS/WASM sandbox (Wazero, pure Go, no CGO). **New MCP tools** on `/api/v1/mcp`: - `list_servers` โ€” discover available MCP servers - `search_tools` โ€” find tools by keyword across servers - `execute_code` โ€” run JS with MCP tools as `awaitMedium3/29/2026
v0.0.6## What's New ### MCP Gateway VoidLLM is now an MCP Gateway โ€” register external MCP servers and proxy tool calls through VoidLLM with access control, session management, and usage tracking. - **Proxy** โ€” `/api/v1/mcp/:alias` routes JSON-RPC to any registered MCP server - **Session management** โ€” automatic initialize + `Mcp-Session-Id` forwarding, re-init on expiry - **Scoped registration** โ€” global (system_admin), org (org_admin), team (team_admin) - **Alias shadowing** โ€” team > org > global pMedium3/28/2026
v0.0.5## What's New ### Multi-Deployment Load Balancing - Configure multiple deployments per model across providers and regions - 4 routing strategies: round-robin, least-latency, weighted, priority - Automatic failover: retries next deployment on 5xx, timeout, or connection error - Per-deployment circuit breakers with independent cooldown periods - Per-deployment health probing โ€” unhealthy deployments skipped during routing - Community feature โ€” no license required ### Load Balancing UI - Create MoMedium3/26/2026
v0.0.4## What's New ### Model Types - `model_type` field across the full stack โ€” chat, embedding, reranking, completion, image, audio_transcription, tts - Type badge on Models page with color-coded variants - Type selector in Create and Edit Model dialogs - Health checker: type-aware functional probe (skips non-chat types) - `/me/available-models` returns `{name, type}` objects ### Playground Tabs - Type-based tabs (Chat / Embedding / Completion) โ€” only shown when models of that type exist - EmbeddiMedium3/24/2026
v0.0.3## Complete UI Redesign Premium dark-theme overhaul of every page in the admin dashboard. ### New Chart Components - **AreaChart** โ€” line/area with gradient fill (Recharts) - **DonutChart** โ€” ring chart with center label (pure SVG) - **HorizontalBar** โ€” horizontal progress bars - **MiniTable** โ€” compact data table ### Visual Upgrades - **Glassmorphism dialogs** โ€” backdrop-blur, semi-transparent, purple accent border - **Segmented pill tabs** โ€” replaces underline tabs across all detail pages -Medium3/23/2026
v0.0.2## What's Changed ### Security - Fix CVE in grpc-go (authorization bypass) โ€” upgraded to v1.79.3 - Fix potential integer overflow in Anthropic adapter (CodeQL finding) - Add OpenSSF Scorecard + CodeQL workflows ### Bug Fixes - Fix user key creation without team membership - Fix all Dialog form submissions (Portal submit bug) - Fix UI feature gating โ€” enterprise pages show UpgradePrompt when not licensed - Fix Sidebar locked items โ€” clickable with upgrade prompt instead of dead links - Fix flakMedium3/23/2026
v0.0.1## VoidLLM v0.0.1 The privacy-first LLM proxy for teams. First public release. ### Highlights - **OpenAI-compatible proxy** with provider adapters (Anthropic, Azure, Ollama, vLLM) - **Org/team/user hierarchy** with 4-role RBAC - **API key management** with HMAC-SHA256 hashing and key rotation - **Usage tracking** with cost reports, hourly rollups, and cross-org analytics - **Rate limiting** โ€” in-memory (single instance) or distributed via Redis - **Enterprise features** โ€” SSO/OIDC, audit logsMedium3/23/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

axonhubโšก๏ธ Open-source AI Gateway โ€” Use any SDK to call 100+ LLMs. Built-in failover, load balancing, cost control & end-to-end tracing.v0.9.35
toolhiveToolHive is an enterprise-grade platform for running and managing Model Context Protocol (MCP) servers.v0.23.1
AgenvoyAgentic framework | Self-improving memory | Pluggable tool extensions | Sandbox executionv0.19.4
goclawGoClaw - GoClaw is OpenClaw rebuilt in Go โ€” with multi-tenant isolation, 5-layer security, and native concurrency. Deploy AI agent teams at scale without compromising on safety.v3.10.0
llm-gatewayZero trust LLM gateway. OpenAI-compatible proxy with semantic routing and load balancing across OpenAI, Anthropic, Ollama, vLLM, and any compatible backend. Identity-based access, virtual Av0.1.4