freshcrate
Skin:/

plano

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

README

Plano Logo

The AI-native proxy server and data plane for agentic apps.

Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for model agility. Use any language or AI framework, and deliver agents faster to production.

Quickstart GuideBuild Agentic Apps with PlanoDocumentationContact

CI Docker Image Build and Deploy Documentation

Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.

Overview

Building agentic demos is easy. Shipping agentic applications safely, reliably, and repeatably to production is hard. After the thrill of a quick hack, you end up building the “hidden middleware” to reach production: routing logic to reach the right agent, guardrail hooks for safety and moderation, evaluation and observability glue for continuous learning, and model/provider quirks scattered across frameworks and application code.

Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane.

Plano pulls rote plumbing out of your framework so you can stay focused on what matters most: the core product logic of your agentic applications. Plano is backed by industry-leading LLM research and built on Envoy by its core contributors, who built critical infrastructure at scale for modern worklaods.

High-Level Network Sequence Diagram: high-level network plano arcitecture for Plano

Jump to our docs to learn how you can use Plano to improve the speed, safety and obervability of your agentic applications.

Important

Plano and the Plano family of LLMs (like Plano-Orchestrator) are hosted free of charge in the US-central region to give you a great first-run developer experience of Plano. To scale and run in production, you can either run these LLMs locally or contact us on Discord for API keys.


Build Agentic Apps with Plano

Plano handles orchestration, model management, and observability as modular building blocks - letting you configure only what you need (edge proxying for agentic orchestration and guardrails, or LLM routing from your services, or both together) to fit cleanly into existing architectures. Below is a simple multi-agent travel agent built with Plano that showcases all three core capabilities

📁 Full working code: See demos/agent_orchestration/travel_agents/ for complete weather and flight agents you can run locally.

1. Define Your Agents in YAML

# config.yaml
version: v0.3.0

# What you declare: Agent URLs and natural language descriptions
# What you don't write: Intent classifiers, routing logic, model fallbacks, provider adapters, or tracing instrumentation

agents:
  - id: weather_agent
    url: http://localhost:10510
  - id: flight_agent
    url: http://localhost:10520

model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
    default: true
  - model: anthropic/claude-3-5-sonnet
    access_key: $ANTHROPIC_API_KEY

listeners:
  - type: agent
    name: travel_assistant
    port: 8001
    router: plano_orchestrator_v1  # Powered by our 4B-parameter routing model. You can change this to different models
    agents:
      - id: weather_agent
        description: |
          Gets real-time weather and forecasts for any city worldwide.
          Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"

      - id: flight_agent
        description: |
          Searches flights between airports with live status and schedules.
          Handles: "Flights from NYC to LA", "Show me flights to Seattle"

tracing:
  random_sampling: 100  # Auto-capture traces for evaluation

2. Write Simple Agent Code

Your agents are just HTTP servers that implement the OpenAI-compatible chat completions endpoint. Use any language or framework:

# weather_agent.py
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI

app = FastAPI()

# Point to Plano's LLM gateway - it handles model routing for you
llm = AsyncOpenAI(base_url="http://localhost:12001/v1", api_key="EMPTY")

@app.post("/v1/chat/completions")
async def chat(request: Request):
    body = await request.json()
    messages = body.get("messages", [])
    days = 7

    # Your agent logic: fetch data, call APIs, run tools
    # See demos/agent_orchestration/travel_agents/ for the full implementation
    weather_data = await get_weather_data(request, messages, days)

    # Stream the response back through Plano
    async def generate():
        stream = await llm.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{"role": "system", "content": f"Weather: {weather_data}"}, *messages],
            stream=True
        )
        async for chunk in stream:
            yield f"data: {chunk.model_dump_json()}\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

3. Start Plano & Query Your Agents

Prerequisites: Follow the prerequisites guide to install Plano and set up your environment.

# Start Plano
planoai up config.yaml
...

# Query - Plano intelligently routes to both agents in a single conversation
curl http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "I want to travel from NYC to Paris next week. What is the weather like there, and can you find me some flights?"}
    ]
  }'
# → Plano routes to weather_agent for Paris weather ✓
# → Then routes to flight_agent for NYC → Paris flights ✓
# → Returns a complete travel plan with both weather info and flight options

4. Get Observability and Model Agility for Free

Every request is traced end-to-end with OpenTelemetry - no instrumentation code needed.

Atomatic Tracing

What You Didn't Have to Build

Infrastructure Concern Without Plano With Plano
Agent Orchestration Write intent classifier + routing logic Declare agent descriptions in YAML
Model Management Handle each provider's API quirks Unified LLM APIs with state management
Rich Tracing Instrument every service with OTEL Automatic end-to-end traces and logs
Learning Signals Build pipeline to capture/export spans Zero-code agentic signals
Adding Agents Update routing code, test, redeploy Add to config, restart

Why it's efficient: Plano uses purpose-built, lightweight LLMs (like our 4B-parameter orchestrator) instead of heavyweight frameworks or GPT-4 for routing - giving you production-grade routing at a fraction of the cost and latency.


Contact

To get in touch with us, please join our discord server. We actively monitor that and offer support there.

Getting Started

Ready to try Plano? Check out our comprehensive documentation:

Contribution

We would love feedback on our Roadmap and we welcome contributions to Plano! Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated. Please visit our Contribution Guide for more details

Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.

Release History

VersionChangesUrgencyDate
0.4.23## What's Changed * ci: add zero-config smoke test for `planoai up` with no args by @adilhafeez in https://github.com/katanemo/plano/pull/919 * fix(brightstaff): enable TLS for redis session cache by @Spherrrical in https://github.com/katanemo/plano/pull/934 * ci+fix: add update-providers workflow + non-destructive fetch_models by @Spherrrical in https://github.com/katanemo/plano/pull/914 * Validate model listener filter references before serving traffic by @mukeshbaphna in https://github.coHigh6/3/2026
0.4.22## What's Changed * fix(anthropic-stream): avoid bare/duplicate message_stop on OpenAI upstream by @adilhafeez in https://github.com/katanemo/plano/pull/898 * fix: prevent index-out-of-bounds panic in signal analyzer follow-up by @adilhafeez in https://github.com/katanemo/plano/pull/896 * Add claude-opus-4-7 to anthropic provider models by @adilhafeez in https://github.com/katanemo/plano/pull/901 * Fix request closures during long-running streaming by @adilhafeez in https://github.com/kataneHigh4/24/2026
0.4.20## What's Changed * add Plano agent skills framework and rule set by @Spherrrical in https://github.com/katanemo/plano/pull/797 * Add DigitalOcean as a first-class LLM provider by @adilhafeez in https://github.com/katanemo/plano/pull/889 * Zero-config planoai up: pass-through proxy with auto-detected providers by @adilhafeez in https://github.com/katanemo/plano/pull/890 * planoai obs: live LLM observability TUI by @adilhafeez in https://github.com/katanemo/plano/pull/891 * fix: passthrough_High4/18/2026
0.4.19## What's Changed * Redis-backed session cache for cross-replica model affinity by @Spherrrical in https://github.com/katanemo/plano/pull/879 * use plano-orchestrator for LLM routing, remove arch-router by @adilhafeez in https://github.com/katanemo/plano/pull/886 * release 0.4.19 by @adilhafeez in https://github.com/katanemo/plano/pull/887 **Full Changelog**: https://github.com/katanemo/plano/compare/0.4.18...0.4.19High4/15/2026
0.4.18## What's Changed * Add first-class Xiaomi provider support by @Spherrrical in https://github.com/katanemo/plano/pull/863 * Model affinity for consistent model selection in agentic loops by @adilhafeez in https://github.com/katanemo/plano/pull/827 * release 0.4.18 by @Spherrrical in https://github.com/katanemo/plano/pull/878 **Full Changelog**: https://github.com/katanemo/plano/compare/0.4.17...0.4.18High4/9/2026
0.4.17## What's Changed * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/860 * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/862 * fix: resolve all open Dependabot security alerts by @adilhafeez in https://github.com/katanemo/plano/pull/866 * Publish docker images to DigitalOcean Container Registry by @adilhafeez in https://github.com/katanemo/plano/pull/868 High4/3/2026
0.4.17-rc1## What's Changed * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/860 * feat(web): merge DigitalOcean release announcement updates by @Spherrrical in https://github.com/katanemo/plano/pull/862 * fix: resolve all open Dependabot security alerts by @adilhafeez in https://github.com/katanemo/plano/pull/866 * Publish docker images to DigitalOcean Container Registry by @adilhafeez in https://github.com/katanemo/plano/pull/868 Medium4/3/2026
0.4.16## What's Changed * Update black hook for Python 3.14 by @Spherrrical in https://github.com/katanemo/plano/pull/857 * Polish planoai up/down CLI output by @Spherrrical in https://github.com/katanemo/plano/pull/858 * replace production panics with graceful error handling in common crate by @adilhafeez in https://github.com/katanemo/plano/pull/844 * fix: route Perplexity OpenAI endpoints without /v1 by @Spherrrical in https://github.com/katanemo/plano/pull/854 * Handle null prefer in inline rMedium4/1/2026
0.4.15## What's Changed * expand configuration reference with missing fields by @Spherrrical in https://github.com/katanemo/plano/pull/851 * model routing: cost/latency ranking with ranked fallback list by @adilhafeez in https://github.com/katanemo/plano/pull/849 * restructure model_metrics_sources to type + provider by @adilhafeez in https://github.com/katanemo/plano/pull/855 * release 0.4.15 by @adilhafeez in https://github.com/katanemo/plano/pull/853 **Full Changelog**: https://github.com/Medium3/31/2026
0.4.14## What's Changed * the orchestrator had a bug where it was setting the wrong headers for… by @salmanap in https://github.com/katanemo/plano/pull/839 * release 0.4.14 by @adilhafeez in https://github.com/katanemo/plano/pull/840 **Full Changelog**: https://github.com/katanemo/plano/compare/0.4.13...0.4.14Low3/20/2026
0.4.13## Highlights **Key Features** - **Output Filter Chain** — Adds support for output filter chains, enabling guardrails and processing on LLM responses in addition to the existing input/prompt filters (#822) - **Kubernetes Deployment Support** — New K8s manifests and docs for self-hosted Arch-Router (vLLM) with GPU support, including deployment YAMLs and in-cluster routing config (#831) **Improvements** - **Brightstaff Refactor** — Extracted `AppState` struct, cleaner error propagation, gracefulLow3/20/2026
0.4.12⏺ ## Highlights ### Key Features - **Codex support** — Plano now works with OpenAI Codex (#808) - **Routing service** — New routing service + support for inline `routing_policy` in request bodies (#814, #815) - **Unified model overrides** — Single config for custom router and orchestrator models (#820) - **New supported models** — Added new LLM models to Plano (#829) ### Improvements - **Native mode logs** — `planoai logs` now works in native (non-Docker) mode (#807) Low3/15/2026
0.4.11## Highlights **Native mode is now the default** — `uv tool install planoai` just works. No Docker, no Rust toolchain, no repo clone needed. Pre-compiled binaries (Envoy, WASM plugins, brightstaff) are automatically downloaded on first run. - Run with Docker using `planoai up config.yaml --docker` (opt-in) - Download progress bars show real-time status for binary downloads **Full Changelog**: https://github.com/katanemo/plano/compare/0.4.10...0.4.11 ## What's Changed * fix: strip tLow3/5/2026
0.4.9 <img width="1080" height="616" alt="image" src="https://github.com/user-attachments/assets/4c400a2d-9dd8-4082-8036-646b24a113b7" /> ## What's Changed * Make model field optional with default provider fallback by @adilhafeez in https://github.com/katanemo/plano/pull/768 * Add workflow preferences to CLAUDE.md by @adilhafeez in https://github.com/katanemo/plano/pull/770 * updating architecture diagram by @salmanap in https://github.com/katanemo/plano/pull/774 * [ISSUE 706]: Standardize Low2/27/2026
0.4.8## What's Changed * Add OpenClaw + Plano intelligent routing demo by @adilhafeez in https://github.com/katanemo/plano/pull/761 * sync CLI templates with demo configs via manifest + CI flow by @Spherrrical in https://github.com/katanemo/plano/pull/764 * docs: Fix incorrect routing preferences in OpenClaw demo config by @tejasunku in https://github.com/katanemo/plano/pull/765 * Upstream TLS validation and configurable connect timeout by @adilhafeez in https://github.com/katanemo/plano/pull/766Low2/18/2026
0.4.7## What's Changed * Supporting OpenClaw routing via Plano <img width="1600" height="900" alt="Your paragraph text" src="https://github.com/user-attachments/assets/09fd4b23-edba-407d-b01a-c3692af189ce" /> Plus many quality of improvement updates * Add CLAUDE.md for Claude Code onboarding by @adilhafeez in https://github.com/katanemo/plano/pull/743 * Upgrade Python base images to 3.13.11 to fix CVE-2025-13836 by @adilhafeez in https://github.com/katanemo/plano/pull/751 * updated the moLow2/17/2026
0.4.6## What's Changed * Site clean by @salmanap in https://github.com/katanemo/plano/pull/716 * add logo carousel to katanemo.com to showcase companies by @Spherrrical in https://github.com/katanemo/plano/pull/718 * Removing duplicate lines by @san81 in https://github.com/katanemo/plano/pull/719 * upgrade rust to 1.93.0 and fix pre-commit by @adilhafeez in https://github.com/katanemo/plano/pull/720 * fixing the README for multi-agent orchestration by @salmanap in https://github.com/katanemo/plLow2/11/2026
0.4.4## What's Changed * add default agent schema enforcement by @adilhafeez in https://github.com/katanemo/plano/pull/702 * add ability to set agent timeout by @adilhafeez in https://github.com/katanemo/plano/pull/710 * Adding support for wildcard models in the model_providers config by @salmanap in https://github.com/katanemo/plano/pull/696 * introduce SEO optimization and improve blog content rendering by @Spherrrical in https://github.com/katanemo/plano/pull/709 * fixing the build scripts foLow1/29/2026
0.4.3## What's Changed * tweaks to web and docs to align to 0.4.2 by @salmanap in https://github.com/katanemo/plano/pull/680 * remove unnecessary clones from code by @adilhafeez in https://github.com/katanemo/plano/pull/682 * Bump next from 16.0.0 to 16.0.10 in /packages/ui by @dependabot[bot] in https://github.com/katanemo/plano/pull/684 * don't include internal models in /v1/models endpoint by @adilhafeez in https://github.com/katanemo/plano/pull/685 * http-filter: add fully http based demo (rLow1/18/2026
0.4.2## What's Changed * Revert "release 0.4.1" by @adilhafeez in https://github.com/katanemo/plano/pull/669 * release 0.4.1 by @adilhafeez in https://github.com/katanemo/plano/pull/670 * update quick start to elevate gateway/proxy example by @adilhafeez in https://github.com/katanemo/plano/pull/671 * simplify readme and point links to docs.planoai.dev by @adilhafeez in https://github.com/katanemo/plano/pull/672 * React: Security vulnerabilities resolved by @Spherrrical in https://github.com/katLow1/7/2026
0.4.1## What's Changed * update mcp_filter docs and talk about docker build and jaeger ui by @adilhafeez in https://github.com/katanemo/plano/pull/652 * add open-web-ui-ref to mcp_filter demo readme by @adilhafeez in https://github.com/katanemo/plano/pull/653 * restructure cli by @adilhafeez in https://github.com/katanemo/plano/pull/656 * publish planoai package from gh action by @adilhafeez in https://github.com/katanemo/plano/pull/657 * update workspace by @adilhafeez in https://github.com/katLow12/28/2025
0.4.0## What's Changed * Improve end to end tracing by @salmanap in https://github.com/katanemo/plano/pull/628 * fixed mixed inputs from openai v1/responses api by @salmanap in https://github.com/katanemo/plano/pull/632 * enable state management for v1/responses by @salmanap in https://github.com/katanemo/plano/pull/631 * orchestration integration by @nehcgs in https://github.com/katanemo/plano/pull/623 * Use mcp tools for filter chain by @adilhafeez in https://github.com/katanemo/plano/pull/621Low12/24/2025
0.3.22## What's Changed * handle agent error better by @adilhafeez in https://github.com/katanemo/archgw/pull/627 * release 0.3.22 by @adilhafeez in https://github.com/katanemo/archgw/pull/629 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.21...0.3.22Low12/11/2025
0.3.21## What's Changed * Add support for v1/responses API by @salmanap in https://github.com/katanemo/archgw/pull/622 * release 0.3.21 by @adilhafeez in https://github.com/katanemo/archgw/pull/626 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.20...0.3.21Low12/4/2025
0.3.20## What's Changed * removing model_server python module to brightstaff (function calling) by @salmanap in https://github.com/katanemo/archgw/pull/615 * removing model_server. buh bye by @salmanap in https://github.com/katanemo/archgw/pull/619 * release 0.3.20 by @adilhafeez in https://github.com/katanemo/archgw/pull/620 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.18...0.3.20Low11/23/2025
0.3.18## What's Changed * fixing a bug where by we were writing the cluster_name for an upstrea… by @salmanap in https://github.com/katanemo/archgw/pull/607 * support base_url path for model providers by @salmanap in https://github.com/katanemo/archgw/pull/608 * support python 3.14 by @branchvincent in https://github.com/katanemo/archgw/pull/605 * release 0.3.18 by @adilhafeez in https://github.com/katanemo/archgw/pull/611 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.17.Low10/31/2025
0.3.17## What's Changed * fix console logs by @adilhafeez in https://github.com/katanemo/archgw/pull/598 * fix config generator bug by @adilhafeez in https://github.com/katanemo/archgw/pull/599 * fixed bug in Bedrock translation code and dramatically improved tracing for outbound LLM traffic by @salmanap in https://github.com/katanemo/archgw/pull/601 <img width="837" height="127" alt="image" src="https://github.com/user-attachments/assets/8c8cb5d9-4bac-45f6-9ffc-68208549599c" /> * move pytestLow10/25/2025
0.3.16## What's Changed * remove proxy-wasm integration tests by @adilhafeez in https://github.com/katanemo/archgw/pull/580 * stream access logs and improve access log format by @adilhafeez in https://github.com/katanemo/archgw/pull/581 * renaming branch by @salmanap in https://github.com/katanemo/archgw/pull/582 * adding support for Qwen models and fixed issue with passing PATH vari… by @salmanap in https://github.com/katanemo/archgw/pull/583 * fixing docs by @salmanap in https://github.com/kataLow10/22/2025
0.3.15## What's Changed * adding support for moonshot and z-ai by @salmanap in https://github.com/katanemo/archgw/pull/578 * release 0.3.15 by @adilhafeez in https://github.com/katanemo/archgw/pull/579 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.14...0.3.15Low9/30/2025
0.3.14## What's Changed * fixed changes related to max_tokens and processing http error codes l… by @salmanap in https://github.com/katanemo/archgw/pull/574 * adding support for claude code routing by @salmanap in https://github.com/katanemo/archgw/pull/575 * fixing README for claude code and adding a helper script to show mode… by @salmanap in https://github.com/katanemo/archgw/pull/576 * release 0.3.14 by @adilhafeez in https://github.com/katanemo/archgw/pull/577 **Full Changelog**: https:/Low9/30/2025
0.3.13## What's Changed * add default implementation for common openai types by @adilhafeez in https://github.com/katanemo/archgw/pull/568 * adding code snippets in a single place for newsletter by @salmanap in https://github.com/katanemo/archgw/pull/569 * draft commit to add support for xAI, TogehterAI, AzureOpenAI by @salmanap in https://github.com/katanemo/archgw/pull/570 * Salmanap/fix docs new providers model alias by @salmanap in https://github.com/katanemo/archgw/pull/571 * release 0.3.13 Low9/19/2025
0.3.12## What's Changed * adding support for model aliases in archgw by @salmanap in https://github.com/katanemo/archgw/pull/566 * release 0.3.12 by @adilhafeez in https://github.com/katanemo/archgw/pull/567 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.11...0.3.12Low9/16/2025
0.3.11## What's Changed * updating the implementation of /v1/chat/completions to use the generi… by @salmanap in https://github.com/katanemo/archgw/pull/548 * updating readme and see how it flows by @salmanap in https://github.com/katanemo/archgw/pull/556 * add support for v1/messages and transformations by @salmanap in https://github.com/katanemo/archgw/pull/558 * release 0.3.11 by @adilhafeez in https://github.com/katanemo/archgw/pull/565 **Full Changelog**: https://github.com/katanemo/archLow9/12/2025
0.3.10## What's Changed * publish to ghrc by @adilhafeez in https://github.com/katanemo/archgw/pull/553 * update base image to python3.13 by @adilhafeez in https://github.com/katanemo/archgw/pull/554 * release 0.3.10 by @adilhafeez in https://github.com/katanemo/archgw/pull/555 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.9...0.3.10Low8/13/2025
0.3.9## What's Changed * fix cve_2025-6020 by removing libpam by @adilhafeez in https://github.com/katanemo/archgw/pull/551 * release 0.3.9 by @adilhafeez in https://github.com/katanemo/archgw/pull/552 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.8...0.3.9Low8/12/2025
0.3.8## What's Changed * Fix code block formatting in LLM Provider documentation by @Spherrrical in https://github.com/katanemo/archgw/pull/543 * archgw_model_server: use sys.executable for uv tool install compat by @kafonek in https://github.com/katanemo/archgw/pull/544 * consistent messaging by @salmanap in https://github.com/katanemo/archgw/pull/546 * pushing new apis module for hermes by @salmanap in https://github.com/katanemo/archgw/pull/547 * update torch==2.6.0 by @adilhafeez in https://Low8/11/2025
0.3.7## What's Changed * bug fix - allow image content to pass through by @adilhafeez in https://github.com/katanemo/archgw/pull/539 * release 0.3.7 by @adilhafeez in https://github.com/katanemo/archgw/pull/542 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.6...0.3.7Low7/26/2025
0.3.6## What's Changed * In request path use same format for usage preferences as arch_config by @adilhafeez in https://github.com/katanemo/archgw/pull/533 * release 0.3.6 by @adilhafeez in https://github.com/katanemo/archgw/pull/536 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.3.5...0.3.6Low7/22/2025
0.3.5## What's Changed * updating the messaging to call ourselves the edge and AI gateway for … by @salmanap in https://github.com/katanemo/archgw/pull/527 * chatgpt.com updated its backend api path. fixing by @salmanap in https://github.com/katanemo/archgw/pull/530 * pass model name in header when a route is selected when using usage p… by @adilhafeez in https://github.com/katanemo/archgw/pull/531 * refactor logging in brightstaff by @adilhafeez in https://github.com/katanemo/archgw/pull/532 * Low7/21/2025
0.3.4# What's Changed ## Breaking changes ### arch_config file format change In llm_providers section we now allow model to contain provider as part of the model definition. This is to simplify the llm_providers section and to allow more concise way to defining providers, Here is a sample [llm_provider definition](https://github.com/katanemo/archgw/blob/main/demos/use_cases/llm_routing/arch_config.yaml#L12-L17) after this change, ``` - access_key: $OPENAI_API_KEY model: openai/Low7/12/2025
0.3.3## What's Changed * pushing docs updated by @salmanap in https://github.com/katanemo/archgw/pull/508 * local support for Arch-Router via Ollama by @salmanap in https://github.com/katanemo/archgw/pull/509 * updating the REAMDE to reflect preference based routing and clean up … by @salmanap in https://github.com/katanemo/archgw/pull/512 * Add support for updating model preferences by @adilhafeez in https://github.com/katanemo/archgw/pull/510 * make arch-router cluster optional by @adilhafeez Low7/8/2025
0.3.2## What's Changed * update readme for preference based routing by @adilhafeez in https://github.com/katanemo/archgw/pull/496 * use consistent version across all arch_config files by @adilhafeez in https://github.com/katanemo/archgw/pull/497 * don't run docker compose up for preference based router e2e demo tests by @adilhafeez in https://github.com/katanemo/archgw/pull/499 * Add ARCH_API_KEY in preference based routing demo by @adilhafeez in https://github.com/katanemo/archgw/pull/498 * UpdLow6/14/2025
0.3.1## What's Changed * add claude-4 in llm_routing demo by @adilhafeez in https://github.com/katanemo/archgw/pull/486 * trim conversation if it exceed max limit of what router model can handle by @adilhafeez in https://github.com/katanemo/archgw/pull/488 * add compress/decompress filter to llm listener by @adilhafeez in https://github.com/katanemo/archgw/pull/489 * add support for openwebui by @adilhafeez in https://github.com/katanemo/archgw/pull/487 * use provider_name as model_id /v1/modelsLow5/31/2025
0.3.0## What's Changed * use separate host port for chat ui and for app_server by @adilhafeez in https://github.com/katanemo/archgw/pull/473 * updating README based on reddit feedback by @salmanap in https://github.com/katanemo/archgw/pull/474 * update arch_config sample on readme to match with new format by @adilhafeez in https://github.com/katanemo/archgw/pull/475 * Introduce brightstaff a new terminal service for llm routing by @adilhafeez in https://github.com/katanemo/archgw/pull/477 * add Low5/23/2025
0.2.8## What's Changed * use archfc v1.1 on archfc.katanemo.dev by @adilhafeez in https://github.com/katanemo/archgw/pull/471 * release 0.2.8 by @adilhafeez in https://github.com/katanemo/archgw/pull/472 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.2.7...0.2.8Low4/22/2025
0.2.7## What's Changed * publish docker images for every release we cut - https://github.com/katanemo/archgw/compare/c7c0553427d4e00b0254838d1a7fc0e9567645a4...6d6c03a7e81f99db4f67c06c158fd2bf6d8de660 * release 0.2.7 by @adilhafeez in https://github.com/katanemo/archgw/pull/469 **Full Changelog**: https://github.com/katanemo/archgw/compare/0.2.6...0.2.7Low4/16/2025
0.2.6.5Release 0.2.6.5Low4/16/2025
0.2.6.4Release 0.2.6.4Low4/16/2025
0.2.6.3Release 0.2.6.3Low4/16/2025
0.2.6.2Release 0.2.6.2Low4/16/2025
0.2.6.1Release 0.2.6.1Low4/16/2025
0.2.6## What's Changed * [chore] Tweak readme docs for minor nits by @darkdatter in https://github.com/katanemo/archgw/pull/461 * fixed issue with groq LLMs that require the openai in the /v1/chat/co… by @salmanap in https://github.com/katanemo/archgw/pull/460 * Integrate Arch-Function-Chat by @nehcgs in https://github.com/katanemo/archgw/pull/449 * release 0.2.6 by @adilhafeez in https://github.com/katanemo/archgw/pull/463 ## New Contributors * @darkdatter made their first contribution in htLow4/15/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

bifrostFastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.ent-v1.4.6-vk-fix-base
tensorzeroTensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.2026.6.0
edgeeOpen-source AI gateway written in Rust, with token compression for Claude Code, Codex... and any other LLM client.v0.2.7
control-layerThe world’s fastest AI model gateway (450x less overhead than LiteLLM). Unified access to LLMs across endpoints (openAI, self-hosted, etc.) behind a single authentication layer - with API key generativ8.58.1
hubHigh-scale LLM gateway, written in Rust. OpenTelemetry-based observability included0.9.2

More in Infrastructure

tensorzeroTensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
modelsThis repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.
edgeeOpen-source AI gateway written in Rust, with token compression for Claude Code, Codex... and any other LLM client.
patent_mcp_serverFastMCP Server for USPTO data