freshcrate

smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history,

Description

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

README

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

πŸš€ Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache stateβ€”whether SGLang, vLLM, or TensorRT-LLMβ€”to reuse prefixes and reduce redundant computation.
πŸ”Œ One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚑ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
πŸ”’ Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
πŸ“Š Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlationβ€”know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install β€” pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run β€” point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use β€” send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Release History

VersionChangesUrgencyDate
v1.4.1## πŸš€ Shepherd Model Gateway v1.4.1 Released Patch release with **mesh HA stability fix**, **DP rank scheduling**, **reasoning parser fixes**, and engine version bumps. ### Mesh HA Stability Fix **Fixed premature worker removal during rolling deploys:** - Workers synced via mesh with `health: false` were being removed by the health checker before they had a chance to pass local health checks - Fix: health checker now only removes workers whose health check **actually failed this tiHigh4/9/2026
v1.4.0## πŸš€ Shepherd Model Gateway v1.4.0 Released The biggest SMG release yet -- **Kubernetes-native deployment via Helm**, a **terminal dashboard**, **200x mesh memory reduction**, **7-11x faster multimodal preprocessing**, **native Completion API over gRPC**, and **per-model retry configuration**. ### Kubernetes-Native Deployment with Helm **Production-ready Helm chart for deploying SMG on Kubernetes:** - **One-command deployment** -- `helm install smg oci://ghcr.io/lightseekorg/smg-helHigh4/2/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

DaemoraOpen-source AI agent - any LLM, any MCP server, any channel. Self-hosted, autonomous, multi-tenant.2026.1.0-beta.0
tweetsave-mcpπŸ“ Fetch Twitter/X content and convert it into blog posts using the MCP server for seamless integration and easy content management.main@2026-04-21
claude-code-proxyMonitor and visualize your Claude Code API interactions with Claude Code Proxy. Easily set up a transparent proxy and live dashboard. πŸ› οΈπŸš€main@2026-04-21
LeanKGLeanKG: Stop Burning Tokens. Start Coding Lean.v0.16.5
agentroveYour own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, GitHub Copilot, OpenRouter), custom skills, and MCP servers.v0.1.30