Home > MCP Servers > smg

smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history,

anthropic anthropic-api chat claude gemini inference-gateway lightseek llm rust

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Description

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

README

Shepherd Model Gateway

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

Why SMG?


🚀 Maximize GPU Utilization	Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend	Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed	Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control	Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability	40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted	Cloud Providers
vLLM	OpenAI
SGLang	Anthropic
TensorRT-LLM	Google Gemini
Ollama	AWS Bedrock
Any OpenAI-compatible server	Azure OpenAI

Features

Feature	Description
8 Routing Policies	cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline	Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration	Connect external tool servers via Model Context Protocol
High Availability	Mesh networking with SWIM protocol for multi-node deployments
Chat History	Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins	Extend with custom WebAssembly logic
Resilience	Circuit breakers, retries with backoff, rate limiting

Documentation


Getting Started	Installation and first steps
Architecture	How SMG works
Configuration	CLI reference and options
API Reference	OpenAI-compatible endpoints
Kubernetes Setup	In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Release History

Version	Changes	Urgency	Date
v1.4.1	## 🚀 Shepherd Model Gateway v1.4.1 Released Patch release with mesh HA stability fix, DP rank scheduling, reasoning parser fixes, and engine version bumps. ### Mesh HA Stability Fix Fixed premature worker removal during rolling deploys: - Workers synced via mesh with `health: false` were being removed by the health checker before they had a chance to pass local health checks - Fix: health checker now only removes workers whose health check **actually failed this ti	High	4/9/2026
v1.4.0	## 🚀 Shepherd Model Gateway v1.4.0 Released The biggest SMG release yet -- Kubernetes-native deployment via Helm, a terminal dashboard, 200x mesh memory reduction, 7-11x faster multimodal preprocessing, native Completion API over gRPC, and per-model retry configuration. ### Kubernetes-Native Deployment with Helm Production-ready Helm chart for deploying SMG on Kubernetes: - One-command deployment -- `helm install smg oci://ghcr.io/lightseekorg/smg-hel	High	4/2/2026
v1.3.3	## 🚀 Shepherd Model Gateway v1.3.3 Released Major performance release with 7x faster mesh synchronization and critical bug fixes. ### ⚡ Mesh Performance Revolution Switched mesh serialization from JSON to bincode with dramatic performance improvements: Benchmark Results (production workload - 1024 operations, 4000 tokens): - Serialization: 7.1x faster (35.5ms → 5.0ms) - Deserialization: 14.8x faster (63.4ms → 4.3ms) - Wire size: 4.3x smaller (67.9MB → 15.7	Low	3/21/2026
v1.3.2	## 🚀 Shepherd Model Gateway v1.3.2 Released Feature release adding multimodal support to Messages API and Python mesh bindings. ### 🎨 Multimodal Support for Messages API Complete vision/image support in Messages API gRPC pipeline: - Native image processing for Messages API requests - Works across all gRPC backends (SGLang, vLLM, TensorRT-LLM) - Full feature parity with Anthropic's Messages API including vision Impact: Messages API now supports both text and vision	Low	3/17/2026
v1.3.1	## 🚀 Shepherd Model Gateway v1.3.1 Released Minor release with operational improvements and bug fixes. ### 🛠️ New Features Operational improvements: - `--remove-unhealthy-workers` flag - Automatically remove workers that fail health checks - `disable_tokenizer_autoload` support - Skip automatic tokenizer loading for custom configurations ### 🐛 Bug Fixes - Gateway: Index external workers by all discovered models (not just primary model) - CI: Added VERSION_OVERR	Low	3/16/2026
v1.3.0	## 🚀 Shepherd Model Gateway v1.3.0 Released We're excited to announce Shepherd Model Gateway v1.3.0 – a major release bringing native Messages API support and expanding our agentic workload capabilities. ### 🎯 Messages API: First-Class Implementation Native Messages API implementation with core protocol support: credits to @CatherineSue - True first-class support — Direct protocol implementation, not a translation layer - Extended thinking — Native ThinkingCon	Low	3/15/2026
v1.2.0	## 🚀 Shepherd Model Gateway v1.2.0 Released! We're thrilled to announce Shepherd Model Gateway v1.2.0 – a transformative release featuring enhanced event-driven cache-aware routing, production-ready client SDKs, Google Gemini integration, and vLLM gRPC server adoption! ### ⚡ Enhanced Event-Driven Cache-Aware Routing Inspired by Amazon Dynamo's distributed caching principles, SMG extends its existing cache-aware routing with real-time KV cache event subscript	Low	3/10/2026
v1.1.0	## 🚀 Shepherd Model Gateway v1.1.0 Released! We're excited to announce Shepherd Model Gateway v1.1.0 – a major feature release bringing universal multimodal support, Messages API MCP integration, and critical production hardening across the entire stack! ### 🎨 Universal Multimodal Support 🔥 Industry-leading multimodal processing across all major inference engines: - SGLang gRPC - Full multimodal pipeline with vision processing - vLLM gRPC - Fetch +	Low	2/23/2026
v1.0.1	## 🎉 Introducing Shepherd Model Gateway v1.0.1! We're thrilled to announce Shepherd Model Gateway v1.0.1 – formerly SGLang Model Gateway. This major release marks a new chapter with a complete architectural overhaul, new enterprise features, and production-grade improvements! ### 🐑 Welcome to Shepherd SGLang Model Gateway is now Shepherd Model Gateway (SMG). Truly Engine-Agnostic Architecture: Shepherd is your universal gateway supporting **all major inference engin	Low	2/13/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

DaemoraOpen-source AI agent - any LLM, any MCP server, any channel. Self-hosted, autonomous, multi-tenant.2026.1.0-beta.0

agentroveYour own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, GitHub Copilot, OpenRouter), custom skills, and MCP servers.v0.1.38

claude-code-proxyMonitor and visualize your Claude Code API interactions with Claude Code Proxy. Easily set up a transparent proxy and live dashboard. 🛠️🚀main@2026-06-06

oh-my-pi⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and morev15.9.5

LeanKGLeanKG: Stop Burning Tokens. Start Coding Lean.v0.17.1

More in MCP Servers

PlanExeCreate a plan from a description in minutes

agentroveYour own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, GitHub Copilot, OpenRouter), custom skills, and MCP servers.

ProxmoxMCP-PlusEnhanced Proxmox MCP server with advanced virtualization management and full OpenAPI integration.

node9-proxyThe Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.