Home > Infrastructure > plano

plano

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

ai-gateway ai-gateway-support envoy envoyproxy gateway generative-ai llm-gateway llm-inference rust

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

README

The AI-native proxy server and data plane for agentic apps.

Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for model agility. Use any language or AI framework, and deliver agents faster to production.

Quickstart Guide • Build Agentic Apps with Plano • Documentation • Contact

Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.

Overview

Building agentic demos is easy. Shipping agentic applications safely, reliably, and repeatably to production is hard. After the thrill of a quick hack, you end up building the “hidden middleware” to reach production: routing logic to reach the right agent, guardrail hooks for safety and moderation, evaluation and observability glue for continuous learning, and model/provider quirks scattered across frameworks and application code.

Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane.

🚦 Orchestration: Low-latency orchestration between agents; add new agents without modifying app code.
🔗 Model Agility: Route by model name, alias (semantic names) or automatically via preferences.
🕵 Agentic Signals™: Zero-code capture of Signals plus OTEL traces/metrics across every agent.
🛡️ Moderation & Memory Hooks: Build jailbreak protection, add moderation policies and memory consistently via Filter Chains.

Plano pulls rote plumbing out of your framework so you can stay focused on what matters most: the core product logic of your agentic applications. Plano is backed by industry-leading LLM research and built on Envoy by its core contributors, who built critical infrastructure at scale for modern worklaods.

High-Level Network Sequence Diagram:

Jump to our docs to learn how you can use Plano to improve the speed, safety and obervability of your agentic applications.

Important

Plano and the Plano family of LLMs (like Plano-Orchestrator) are hosted free of charge in the US-central region to give you a great first-run developer experience of Plano. To scale and run in production, you can either run these LLMs locally or contact us on Discord for API keys.

Build Agentic Apps with Plano

Plano handles orchestration, model management, and observability as modular building blocks - letting you configure only what you need (edge proxying for agentic orchestration and guardrails, or LLM routing from your services, or both together) to fit cleanly into existing architectures. Below is a simple multi-agent travel agent built with Plano that showcases all three core capabilities

📁 Full working code: See demos/agent_orchestration/travel_agents/ for complete weather and flight agents you can run locally.

1. Define Your Agents in YAML

# config.yaml
version: v0.3.0

# What you declare: Agent URLs and natural language descriptions
# What you don't write: Intent classifiers, routing logic, model fallbacks, provider adapters, or tracing instrumentation

agents:
  - id: weather_agent
    url: http://localhost:10510
  - id: flight_agent
    url: http://localhost:10520

model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
    default: true
  - model: anthropic/claude-3-5-sonnet
    access_key: $ANTHROPIC_API_KEY

listeners:
  - type: agent
    name: travel_assistant
    port: 8001
    router: plano_orchestrator_v1  # Powered by our 4B-parameter routing model. You can change this to different models
    agents:
      - id: weather_agent
        description: |
          Gets real-time weather and forecasts for any city worldwide.
          Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"

      - id: flight_agent
        description: |
          Searches flights between airports with live status and schedules.
          Handles: "Flights from NYC to LA", "Show me flights to Seattle"

tracing:
  random_sampling: 100  # Auto-capture traces for evaluation

2. Write Simple Agent Code

Your agents are just HTTP servers that implement the OpenAI-compatible chat completions endpoint. Use any language or framework:

# weather_agent.py
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI

app = FastAPI()

# Point to Plano's LLM gateway - it handles model routing for you
llm = AsyncOpenAI(base_url="http://localhost:12001/v1", api_key="EMPTY")

@app.post("/v1/chat/completions")
async def chat(request: Request):
    body = await request.json()
    messages = body.get("messages", [])
    days = 7

    # Your agent logic: fetch data, call APIs, run tools
    # See demos/agent_orchestration/travel_agents/ for the full implementation
    weather_data = await get_weather_data(request, messages, days)

    # Stream the response back through Plano
    async def generate():
        stream = await llm.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{"role": "system", "content": f"Weather: {weather_data}"}, *messages],
            stream=True
        )
        async for chunk in stream:
            yield f"data: {chunk.model_dump_json()}\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

3. Start Plano & Query Your Agents

Prerequisites: Follow the prerequisites guide to install Plano and set up your environment.

# Start Plano
planoai up config.yaml
...

# Query - Plano intelligently routes to both agents in a single conversation
curl http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "I want to travel from NYC to Paris next week. What is the weather like there, and can you find me some flights?"}
    ]
  }'
# → Plano routes to weather_agent for Paris weather ✓
# → Then routes to flight_agent for NYC → Paris flights ✓
# → Returns a complete travel plan with both weather info and flight options

4. Get Observability and Model Agility for Free

Every request is traced end-to-end with OpenTelemetry - no instrumentation code needed.

What You Didn't Have to Build

Infrastructure Concern	Without Plano	With Plano
Agent Orchestration	Write intent classifier + routing logic	Declare agent descriptions in YAML
Model Management	Handle each provider's API quirks	Unified LLM APIs with state management
Rich Tracing	Instrument every service with OTEL	Automatic end-to-end traces and logs
Learning Signals	Build pipeline to capture/export spans	Zero-code agentic signals
Adding Agents	Update routing code, test, redeploy	Add to config, restart

Why it's efficient: Plano uses purpose-built, lightweight LLMs (like our 4B-parameter orchestrator) instead of heavyweight frameworks or GPT-4 for routing - giving you production-grade routing at a fraction of the cost and latency.

Contact

To get in touch with us, please join our discord server. We actively monitor that and offer support there.

Getting Started

Ready to try Plano? Check out our comprehensive documentation:

Quickstart Guide - Get up and running in minutes
LLM Routing - Route by model name, alias, or intelligent preferences
Agent Orchestration - Build multi-agent workflows
Filter Chains - Add guardrails, moderation, and memory hooks
Prompt Targets - Turn prompts into deterministic API calls
Observability - Traces, metrics, and logs

Contribution

We would love feedback on our Roadmap and we welcome contributions to Plano! Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated. Please visit our Contribution Guide for more details

Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.

Release History

Version	Changes	Urgency	Date
0.4.28	## What's Changed * docs: fix "quuickstart" typo and duplicated heading in supported_providers.rst by @Lagmator22 in https://github.com/katanemo/plano/pull/987 * fix: GPT-5.6 SSE streams truncated through /v1/responses (chunk framing, raw identity passthrough, gzip window) by @Jiliac in https://github.com/katanemo/plano/pull/988 * feat(routing): automatic prompt caching + a per-session routing budget by @Spherrrical in https://github.com/katanemo/plano/pull/982 * release 0.4.28 by @Spherrric	High	7/21/2026
0.4.27	## What's Changed * fix(hermesllm): preserve output_text for Responses API multi-turn by @Spherrrical in https://github.com/katanemo/plano/pull/978 * feat(hermesllm): add MiniMax provider by @octo-patch in https://github.com/katanemo/plano/pull/981 * add Meta Model API provider for Muse Spark 1.1 by @Spherrrical in https://github.com/katanemo/plano/pull/984 * fix(docker): remove curl to drop vulnerable libssh2 transitive dep by @Spherrrical in https://github.com/katanemo/plano/pull/986 * re	High	7/9/2026
0.4.26	## What's Changed * fix(docs): pin sphinxawesome-theme to <6.0.0 by @Spherrrical in https://github.com/katanemo/plano/pull/968 * fix(ci): switch retired claude-sonnet-4-20250514 to claude-sonnet-4-6 by @Spherrrical in https://github.com/katanemo/plano/pull/975 * feat: configurable model pricing source by @Spherrrical in https://github.com/katanemo/plano/pull/971 * Remove deprecated legacy signal OTel attributes by @Spherrrical in https://github.com/katanemo/plano/pull/976 * feat(tracing): p	High	6/25/2026
0.4.25	## What's Changed * Add the system role into messages array by @ShivaniKumar1 in https://github.com/katanemo/plano/pull/967 * release 0.4.25 by @Spherrrical in https://github.com/katanemo/plano/pull/969 ## New Contributors * @ShivaniKumar1 made their first contribution in https://github.com/katanemo/plano/pull/967 Full Changelog: https://github.com/katanemo/plano/compare/0.4.24...0.4.25	High	6/15/2026
0.4.24	## What's Changed * chore(models): update provider models by @Spherrrical in https://github.com/katanemo/plano/pull/965 * release 0.4.24 by @Spherrrical in https://github.com/katanemo/plano/pull/966 Full Changelog: https://github.com/katanemo/plano/compare/0.4.23...0.4.24	High	6/9/2026
0.4.23	## What's Changed * ci: add zero-config smoke test for `planoai up` with no args by @adilhafeez in https://github.com/katanemo/plano/pull/919 * fix(brightstaff): enable TLS for redis session cache by @Spherrrical in https://github.com/katanemo/plano/pull/934 * ci+fix: add update-providers workflow + non-destructive fetch_models by @Spherrrical in https://github.com/katanemo/plano/pull/914 * Validate model listener filter references before serving traffic by @mukeshbaphna in https://github.co	High	6/3/2026
0.4.22	## What's Changed * fix(anthropic-stream): avoid bare/duplicate message_stop on OpenAI upstream by @adilhafeez in https://github.com/katanemo/plano/pull/898 * fix: prevent index-out-of-bounds panic in signal analyzer follow-up by @adilhafeez in https://github.com/katanemo/plano/pull/896 * Add claude-opus-4-7 to anthropic provider models by @adilhafeez in https://github.com/katanemo/plano/pull/901 * Fix request closures during long-running streaming by @adilhafeez in https://github.com/katane	High	4/24/2026
0.4.20	## What's Changed * add Plano agent skills framework and rule set by @Spherrrical in https://github.com/katanemo/plano/pull/797 * Add DigitalOcean as a first-class LLM provider by @adilhafeez in https://github.com/katanemo/plano/pull/889 * Zero-config planoai up: pass-through proxy with auto-detected providers by @adilhafeez in https://github.com/katanemo/plano/pull/890 * planoai obs: live LLM observability TUI by @adilhafeez in https://github.com/katanemo/plano/pull/891 * fix: passthrough_	High	4/18/2026
0.4.19	## What's Changed * Redis-backed session cache for cross-replica model affinity by @Spherrrical in https://github.com/katanemo/plano/pull/879 * use plano-orchestrator for LLM routing, remove arch-router by @adilhafeez in https://github.com/katanemo/plano/pull/886 * release 0.4.19 by @adilhafeez in https://github.com/katanemo/plano/pull/887 Full Changelog: https://github.com/katanemo/plano/compare/0.4.18...0.4.19	High	4/15/2026
0.4.18	## What's Changed * Add first-class Xiaomi provider support by @Spherrrical in https://github.com/katanemo/plano/pull/863 * Model affinity for consistent model selection in agentic loops by @adilhafeez in https://github.com/katanemo/plano/pull/827 * release 0.4.18 by @Spherrrical in https://github.com/katanemo/plano/pull/878 Full Changelog: https://github.com/katanemo/plano/compare/0.4.17...0.4.18	High	4/9/2026
0.4.17	## What's Changed * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/860 * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/862 * fix: resolve all open Dependabot security alerts by @adilhafeez in https://github.com/katanemo/plano/pull/866 * Publish docker images to DigitalOcean Container Registry by @adilhafeez in https://github.com/katanemo/plano/pull/868	High	4/3/2026
0.4.17-rc1	## What's Changed * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/860 * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/862 * fix: resolve all open Dependabot security alerts by @adilhafeez in https://github.com/katanemo/plano/pull/866 * Publish docker images to DigitalOcean Container Registry by @adilhafeez in https://github.com/katanemo/plano/pull/868	Medium	4/3/2026
0.4.16	## What's Changed * Update black hook for Python 3.14 by @Spherrrical in https://github.com/katanemo/plano/pull/857 * Polish planoai up/down CLI output by @Spherrrical in https://github.com/katanemo/plano/pull/858 * replace production panics with graceful error handling in common crate by @adilhafeez in https://github.com/katanemo/plano/pull/844 * fix: route Perplexity OpenAI endpoints without /v1 by @Spherrrical in https://github.com/katanemo/plano/pull/854 * Handle null prefer in inline r	Medium	4/1/2026
0.4.15	## What's Changed * expand configuration reference with missing fields by @Spherrrical in https://github.com/katanemo/plano/pull/851 * model routing: cost/latency ranking with ranked fallback list by @adilhafeez in https://github.com/katanemo/plano/pull/849 * restructure model_metrics_sources to type + provider by @adilhafeez in https://github.com/katanemo/plano/pull/855 * release 0.4.15 by @adilhafeez in https://github.com/katanemo/plano/pull/853 Full Changelog: https://github.com/	Medium	3/31/2026
0.4.14	## What's Changed * the orchestrator had a bug where it was setting the wrong headers for… by @salmanap in https://github.com/katanemo/plano/pull/839 * release 0.4.14 by @adilhafeez in https://github.com/katanemo/plano/pull/840 Full Changelog: https://github.com/katanemo/plano/compare/0.4.13...0.4.14	Low	3/20/2026
0.4.13	## Highlights Key Features - Output Filter Chain — Adds support for output filter chains, enabling guardrails and processing on LLM responses in addition to the existing input/prompt filters (#822) - Kubernetes Deployment Support — New K8s manifests and docs for self-hosted Arch-Router (vLLM) with GPU support, including deployment YAMLs and in-cluster routing config (#831) Improvements - Brightstaff Refactor — Extracted `AppState` struct, cleaner error propagation, graceful	Low	3/20/2026
0.4.12	⏺ ## Highlights ### Key Features - Codex support — Plano now works with OpenAI Codex (#808) - Routing service — New routing service + support for inline `routing_policy` in request bodies (#814, #815) - Unified model overrides — Single config for custom router and orchestrator models (#820) - New supported models — Added new LLM models to Plano (#829) ### Improvements - Native mode logs — `planoai logs` now works in native (non-Docker) mode (#807)	Low	3/15/2026
0.4.11	## Highlights Native mode is now the default — `uv tool install planoai` just works. No Docker, no Rust toolchain, no repo clone needed. Pre-compiled binaries (Envoy, WASM plugins, brightstaff) are automatically downloaded on first run. - Run with Docker using `planoai up config.yaml --docker` (opt-in) - Download progress bars show real-time status for binary downloads Full Changelog: https://github.com/katanemo/plano/compare/0.4.10...0.4.11 ## What's Changed * fix: strip t	Low	3/5/2026
0.4.9	<img width="1080" height="616" alt="image" src="https://github.com/user-attachments/assets/4c400a2d-9dd8-4082-8036-646b24a113b7" /> ## What's Changed * Make model field optional with default provider fallback by @adilhafeez in https://github.com/katanemo/plano/pull/768 * Add workflow preferences to CLAUDE.md by @adilhafeez in https://github.com/katanemo/plano/pull/770 * updating architecture diagram by @salmanap in https://github.com/katanemo/plano/pull/774 * [ISSUE 706]: Standardize	Low	2/27/2026
0.4.8	## What's Changed * Add OpenClaw + Plano intelligent routing demo by @adilhafeez in https://github.com/katanemo/plano/pull/761 * sync CLI templates with demo configs via manifest + CI flow by @Spherrrical in https://github.com/katanemo/plano/pull/764 * docs: Fix incorrect routing preferences in OpenClaw demo config by @tejasunku in https://github.com/katanemo/plano/pull/765 * Upstream TLS validation and configurable connect timeout by @adilhafeez in https://github.com/katanemo/plano/pull/766	Low	2/18/2026
0.4.7	## What's Changed * Supporting OpenClaw routing via Plano <img width="1600" height="900" alt="Your paragraph text" src="https://github.com/user-attachments/assets/09fd4b23-edba-407d-b01a-c3692af189ce" /> Plus many quality of improvement updates * Add CLAUDE.md for Claude Code onboarding by @adilhafeez in https://github.com/katanemo/plano/pull/743 * Upgrade Python base images to 3.13.11 to fix CVE-2025-13836 by @adilhafeez in https://github.com/katanemo/plano/pull/751 * updated the mo	Low	2/17/2026
0.4.6	## What's Changed * Site clean by @salmanap in https://github.com/katanemo/plano/pull/716 * add logo carousel to katanemo.com to showcase companies by @Spherrrical in https://github.com/katanemo/plano/pull/718 * Removing duplicate lines by @san81 in https://github.com/katanemo/plano/pull/719 * upgrade rust to 1.93.0 and fix pre-commit by @adilhafeez in https://github.com/katanemo/plano/pull/720 * fixing the README for multi-agent orchestration by @salmanap in https://github.com/katanemo/pl	Low	2/11/2026
0.4.4	## What's Changed * add default agent schema enforcement by @adilhafeez in https://github.com/katanemo/plano/pull/702 * add ability to set agent timeout by @adilhafeez in https://github.com/katanemo/plano/pull/710 * Adding support for wildcard models in the model_providers config by @salmanap in https://github.com/katanemo/plano/pull/696 * introduce SEO optimization and improve blog content rendering by @Spherrrical in https://github.com/katanemo/plano/pull/709 * fixing the build scripts fo	Low	1/29/2026
0.4.3	## What's Changed * tweaks to web and docs to align to 0.4.2 by @salmanap in https://github.com/katanemo/plano/pull/680 * remove unnecessary clones from code by @adilhafeez in https://github.com/katanemo/plano/pull/682 * Bump next from 16.0.0 to 16.0.10 in /packages/ui by @dependabot[bot] in https://github.com/katanemo/plano/pull/684 * don't include internal models in /v1/models endpoint by @adilhafeez in https://github.com/katanemo/plano/pull/685 * http-filter: add fully http based demo (r	Low	1/18/2026
0.4.2	## What's Changed * Revert "release 0.4.1" by @adilhafeez in https://github.com/katanemo/plano/pull/669 * release 0.4.1 by @adilhafeez in https://github.com/katanemo/plano/pull/670 * update quick start to elevate gateway/proxy example by @adilhafeez in https://github.com/katanemo/plano/pull/671 * simplify readme and point links to docs.planoai.dev by @adilhafeez in https://github.com/katanemo/plano/pull/672 * React: Security vulnerabilities resolved by @Spherrrical in https://github.com/kat	Low	1/7/2026
0.4.1	## What's Changed * update mcp_filter docs and talk about docker build and jaeger ui by @adilhafeez in https://github.com/katanemo/plano/pull/652 * add open-web-ui-ref to mcp_filter demo readme by @adilhafeez in https://github.com/katanemo/plano/pull/653 * restructure cli by @adilhafeez in https://github.com/katanemo/plano/pull/656 * publish planoai package from gh action by @adilhafeez in https://github.com/katanemo/plano/pull/657 * update workspace by @adilhafeez in https://github.com/kat	Low	12/28/2025
0.4.0	## What's Changed * Improve end to end tracing by @salmanap in https://github.com/katanemo/plano/pull/628 * fixed mixed inputs from openai v1/responses api by @salmanap in https://github.com/katanemo/plano/pull/632 * enable state management for v1/responses by @salmanap in https://github.com/katanemo/plano/pull/631 * orchestration integration by @nehcgs in https://github.com/katanemo/plano/pull/623 * Use mcp tools for filter chain by @adilhafeez in https://github.com/katanemo/plano/pull/621	Low	12/24/2025
0.3.22	## What's Changed * handle agent error better by @adilhafeez in https://github.com/katanemo/archgw/pull/627 * release 0.3.22 by @adilhafeez in https://github.com/katanemo/archgw/pull/629 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.21...0.3.22	Low	12/11/2025
0.3.21	## What's Changed * Add support for v1/responses API by @salmanap in https://github.com/katanemo/archgw/pull/622 * release 0.3.21 by @adilhafeez in https://github.com/katanemo/archgw/pull/626 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.20...0.3.21	Low	12/4/2025
0.3.20	## What's Changed * removing model_server python module to brightstaff (function calling) by @salmanap in https://github.com/katanemo/archgw/pull/615 * removing model_server. buh bye by @salmanap in https://github.com/katanemo/archgw/pull/619 * release 0.3.20 by @adilhafeez in https://github.com/katanemo/archgw/pull/620 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.18...0.3.20	Low	11/23/2025
0.3.18	## What's Changed * fixing a bug where by we were writing the cluster_name for an upstrea… by @salmanap in https://github.com/katanemo/archgw/pull/607 * support base_url path for model providers by @salmanap in https://github.com/katanemo/archgw/pull/608 * support python 3.14 by @branchvincent in https://github.com/katanemo/archgw/pull/605 * release 0.3.18 by @adilhafeez in https://github.com/katanemo/archgw/pull/611 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.17.	Low	10/31/2025
0.3.17	## What's Changed * fix console logs by @adilhafeez in https://github.com/katanemo/archgw/pull/598 * fix config generator bug by @adilhafeez in https://github.com/katanemo/archgw/pull/599 * fixed bug in Bedrock translation code and dramatically improved tracing for outbound LLM traffic by @salmanap in https://github.com/katanemo/archgw/pull/601 <img width="837" height="127" alt="image" src="https://github.com/user-attachments/assets/8c8cb5d9-4bac-45f6-9ffc-68208549599c" /> * move pytest	Low	10/25/2025
0.3.16	## What's Changed * remove proxy-wasm integration tests by @adilhafeez in https://github.com/katanemo/archgw/pull/580 * stream access logs and improve access log format by @adilhafeez in https://github.com/katanemo/archgw/pull/581 * renaming branch by @salmanap in https://github.com/katanemo/archgw/pull/582 * adding support for Qwen models and fixed issue with passing PATH vari… by @salmanap in https://github.com/katanemo/archgw/pull/583 * fixing docs by @salmanap in https://github.com/kata	Low	10/22/2025
0.3.15	## What's Changed * adding support for moonshot and z-ai by @salmanap in https://github.com/katanemo/archgw/pull/578 * release 0.3.15 by @adilhafeez in https://github.com/katanemo/archgw/pull/579 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.14...0.3.15	Low	9/30/2025
0.3.14	## What's Changed * fixed changes related to max_tokens and processing http error codes l… by @salmanap in https://github.com/katanemo/archgw/pull/574 * adding support for claude code routing by @salmanap in https://github.com/katanemo/archgw/pull/575 * fixing README for claude code and adding a helper script to show mode… by @salmanap in https://github.com/katanemo/archgw/pull/576 * release 0.3.14 by @adilhafeez in https://github.com/katanemo/archgw/pull/577 Full Changelog: https:/	Low	9/30/2025
0.3.13	## What's Changed * add default implementation for common openai types by @adilhafeez in https://github.com/katanemo/archgw/pull/568 * adding code snippets in a single place for newsletter by @salmanap in https://github.com/katanemo/archgw/pull/569 * draft commit to add support for xAI, TogehterAI, AzureOpenAI by @salmanap in https://github.com/katanemo/archgw/pull/570 * Salmanap/fix docs new providers model alias by @salmanap in https://github.com/katanemo/archgw/pull/571 * release 0.3.13	Low	9/19/2025
0.3.12	## What's Changed * adding support for model aliases in archgw by @salmanap in https://github.com/katanemo/archgw/pull/566 * release 0.3.12 by @adilhafeez in https://github.com/katanemo/archgw/pull/567 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.11...0.3.12	Low	9/16/2025
0.3.11	## What's Changed * updating the implementation of /v1/chat/completions to use the generi… by @salmanap in https://github.com/katanemo/archgw/pull/548 * updating readme and see how it flows by @salmanap in https://github.com/katanemo/archgw/pull/556 * add support for v1/messages and transformations by @salmanap in https://github.com/katanemo/archgw/pull/558 * release 0.3.11 by @adilhafeez in https://github.com/katanemo/archgw/pull/565 Full Changelog: https://github.com/katanemo/arch	Low	9/12/2025
0.3.10	## What's Changed * publish to ghrc by @adilhafeez in https://github.com/katanemo/archgw/pull/553 * update base image to python3.13 by @adilhafeez in https://github.com/katanemo/archgw/pull/554 * release 0.3.10 by @adilhafeez in https://github.com/katanemo/archgw/pull/555 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.9...0.3.10	Low	8/13/2025
0.3.9	## What's Changed * fix cve_2025-6020 by removing libpam by @adilhafeez in https://github.com/katanemo/archgw/pull/551 * release 0.3.9 by @adilhafeez in https://github.com/katanemo/archgw/pull/552 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.8...0.3.9	Low	8/12/2025
0.3.8	## What's Changed * Fix code block formatting in LLM Provider documentation by @Spherrrical in https://github.com/katanemo/archgw/pull/543 * archgw_model_server: use sys.executable for uv tool install compat by @kafonek in https://github.com/katanemo/archgw/pull/544 * consistent messaging by @salmanap in https://github.com/katanemo/archgw/pull/546 * pushing new apis module for hermes by @salmanap in https://github.com/katanemo/archgw/pull/547 * update torch==2.6.0 by @adilhafeez in https://	Low	8/11/2025
0.3.7	## What's Changed * bug fix - allow image content to pass through by @adilhafeez in https://github.com/katanemo/archgw/pull/539 * release 0.3.7 by @adilhafeez in https://github.com/katanemo/archgw/pull/542 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.6...0.3.7	Low	7/26/2025
0.3.6	## What's Changed * In request path use same format for usage preferences as arch_config by @adilhafeez in https://github.com/katanemo/archgw/pull/533 * release 0.3.6 by @adilhafeez in https://github.com/katanemo/archgw/pull/536 Full Changelog: https://github.com/katanemo/archgw/compare/0.3.5...0.3.6	Low	7/22/2025
0.3.5	## What's Changed * updating the messaging to call ourselves the edge and AI gateway for … by @salmanap in https://github.com/katanemo/archgw/pull/527 * chatgpt.com updated its backend api path. fixing by @salmanap in https://github.com/katanemo/archgw/pull/530 * pass model name in header when a route is selected when using usage p… by @adilhafeez in https://github.com/katanemo/archgw/pull/531 * refactor logging in brightstaff by @adilhafeez in https://github.com/katanemo/archgw/pull/532 *	Low	7/21/2025
0.3.4	# What's Changed ## Breaking changes ### arch_config file format change In llm_providers section we now allow model to contain provider as part of the model definition. This is to simplify the llm_providers section and to allow more concise way to defining providers, Here is a sample [llm_provider definition](https://github.com/katanemo/archgw/blob/main/demos/use_cases/llm_routing/arch_config.yaml#L12-L17) after this change, ``` - access_key: $OPENAI_API_KEY model: openai/	Low	7/12/2025
0.3.3	## What's Changed * pushing docs updated by @salmanap in https://github.com/katanemo/archgw/pull/508 * local support for Arch-Router via Ollama by @salmanap in https://github.com/katanemo/archgw/pull/509 * updating the REAMDE to reflect preference based routing and clean up … by @salmanap in https://github.com/katanemo/archgw/pull/512 * Add support for updating model preferences by @adilhafeez in https://github.com/katanemo/archgw/pull/510 * make arch-router cluster optional by @adilhafeez	Low	7/8/2025
0.3.2	## What's Changed * update readme for preference based routing by @adilhafeez in https://github.com/katanemo/archgw/pull/496 * use consistent version across all arch_config files by @adilhafeez in https://github.com/katanemo/archgw/pull/497 * don't run docker compose up for preference based router e2e demo tests by @adilhafeez in https://github.com/katanemo/archgw/pull/499 * Add ARCH_API_KEY in preference based routing demo by @adilhafeez in https://github.com/katanemo/archgw/pull/498 * Upd	Low	6/14/2025
0.3.1	## What's Changed * add claude-4 in llm_routing demo by @adilhafeez in https://github.com/katanemo/archgw/pull/486 * trim conversation if it exceed max limit of what router model can handle by @adilhafeez in https://github.com/katanemo/archgw/pull/488 * add compress/decompress filter to llm listener by @adilhafeez in https://github.com/katanemo/archgw/pull/489 * add support for openwebui by @adilhafeez in https://github.com/katanemo/archgw/pull/487 * use provider_name as model_id /v1/models	Low	5/31/2025
0.3.0	## What's Changed * use separate host port for chat ui and for app_server by @adilhafeez in https://github.com/katanemo/archgw/pull/473 * updating README based on reddit feedback by @salmanap in https://github.com/katanemo/archgw/pull/474 * update arch_config sample on readme to match with new format by @adilhafeez in https://github.com/katanemo/archgw/pull/475 * Introduce brightstaff a new terminal service for llm routing by @adilhafeez in https://github.com/katanemo/archgw/pull/477 * add	Low	5/23/2025
0.2.8	## What's Changed * use archfc v1.1 on archfc.katanemo.dev by @adilhafeez in https://github.com/katanemo/archgw/pull/471 * release 0.2.8 by @adilhafeez in https://github.com/katanemo/archgw/pull/472 Full Changelog: https://github.com/katanemo/archgw/compare/0.2.7...0.2.8	Low	4/22/2025
0.2.7	## What's Changed * publish docker images for every release we cut - https://github.com/katanemo/archgw/compare/c7c0553427d4e00b0254838d1a7fc0e9567645a4...6d6c03a7e81f99db4f67c06c158fd2bf6d8de660 * release 0.2.7 by @adilhafeez in https://github.com/katanemo/archgw/pull/469 Full Changelog: https://github.com/katanemo/archgw/compare/0.2.6...0.2.7	Low	4/16/2025
0.2.6.5	Release 0.2.6.5	Low	4/16/2025
0.2.6.4	Release 0.2.6.4	Low	4/16/2025
0.2.6.3	Release 0.2.6.3	Low	4/16/2025
0.2.6.2	Release 0.2.6.2	Low	4/16/2025
0.2.6.1	Release 0.2.6.1	Low	4/16/2025
0.2.6	## What's Changed * [chore] Tweak readme docs for minor nits by @darkdatter in https://github.com/katanemo/archgw/pull/461 * fixed issue with groq LLMs that require the openai in the /v1/chat/co… by @salmanap in https://github.com/katanemo/archgw/pull/460 * Integrate Arch-Function-Chat by @nehcgs in https://github.com/katanemo/archgw/pull/449 * release 0.2.6 by @adilhafeez in https://github.com/katanemo/archgw/pull/463 ## New Contributors * @darkdatter made their first contribution in ht	Low	4/15/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

bifrostFastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.ent-v2.0.0-prerelease2-base

control-layerThe world’s fastest AI model gateway (450x less overhead than LiteLLM). Unified access to LLMs across endpoints (openAI, self-hosted, etc.) behind a single authentication layer - with API key generativ8.101.0

edgeeOpen-source AI gateway written in Rust, with token compression for Claude Code, Codex... and any other LLM client.v0.3.0

hubHigh-scale LLM gateway, written in Rust. OpenTelemetry-based observability included0.10.1

llm-gatewayZero trust LLM gateway. OpenAI-compatible proxy with semantic routing and load balancing across OpenAI, Anthropic, Ollama, vLLM, and any compatible backend. Identity-based access, virtual Av0.1.5

More in Infrastructure

llm7.ioLLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.

modelsThis repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.

chak-aiA simple, yet handy, LLM gateway.