Home > Infrastructure > tensorzero

tensorzero

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt rust

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Description

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

README

TensorZero Logo

TensorZero

GitHub Trending - #1 Repository Of The Day

TensorZero is an open-source LLMOps platform that unifies:

Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
Observability: store inferences and feedback in your database, available programmatically or in the UI
Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

You can take what you need, adopt incrementally, and complement with other tools. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM provider.

TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today.

Website · Docs · Twitter · Slack · Discord

Quick Start (5min) · Deployment Guide · API Reference · Configuration Reference

Demo

tensorzero-demo.mp4

Features

Note

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.

It dramatically improves the performance of LLM agents across diverse tasks:

Bar chart showing baseline vs. optimized scores across diverse LLM tasks

Learn more → Schedule a demo →

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Call any LLM (API or self-hosted) through a single unified API
Infer with tool use, structured outputs (JSON), batch, embeddings, multimodal (images, files), caching, etc.
Create prompt templates and schemas to enforce a structured interface between your application and the LLMs
Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: <1ms p99 latency overhead at 10k+ QPS
Ensure high availability with routing, retries, fallbacks, load balancing, granular timeouts, etc.
Track usage and cost and enforce custom rate limits with granular scopes (e.g. tags)
Set up auth for TensorZero to allow clients to access models without sharing provider API keys

Supported Model Providers

Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI (Grok).

Need something else? TensorZero also supports any OpenAI-compatible API (e.g. Ollama).

Usage Example

You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.

Deploy the TensorZero Gateway (one Docker container).
Update the base_url and model in your OpenAI-compatible client.
Run inference:

from openai import OpenAI

# Point the client to the TensorZero Gateway
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(
    # Call any model provider (or TensorZero function)
    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Share a fun fact about TensorZero.",
        }
    ],
)

See Quick Start for more information.

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

Store inferences and feedback (metrics, human edits, etc.) in your own database
Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
Build datasets for optimization, evaluation, and other workflows
Replay historical inferences with new prompts, models, inference strategies, etc.
Export OpenTelemetry traces (OTLP) and export Prometheus metrics to your favorite application observability tools
Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

Optimize your models with supervised fine-tuning, RLHF, and other techniques
Optimize your prompts with automated prompt engineering algorithms like GEPA
Optimize your inference strategy with dynamic in-context learning, best/mixture-of-N sampling, etc.
Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
Soon: synthetic data generation

📊 LLM Evaluation

Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.

Evaluate individual inferences with inference evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
Evaluate end-to-end workflows with workflow evaluations with complete flexibility (≈ integration tests for LLMs)
Optimize LLM judges just like any other TensorZero function to align them to human preferences
Soon: more built-in evaluators; headless evaluations

Evaluation » UI Evaluation » CLI

docker compose run --rm evaluations \
  --evaluation-name extract_data \
  --dataset-name hard_test_cases \
  --variant-name gpt_4o \
  --concurrency 5

Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4
Number of datapoints: 100
██████████████████████████████████████ 100/100
exact_match: 0.83 ± 0.03 (n=100)
semantic_match: 0.98 ± 0.01 (n=100)
item_count: 7.15 ± 0.39 (n=100)

🧪 LLM Experimentation

Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Run adaptive A/B tests to ship with confidence and identify the best prompts and models for your use cases.
Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.

& more!

Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.

Build simple applications or massive deployments with GitOps-friendly orchestration
Extend TensorZero with built-in escape hatches, programmatic-first usage, direct database access, and more
Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
Iterate quickly by experimenting with prompts interactively using the Playground UI

Frequently Asked Questions

How is TensorZero different from other LLM frameworks?

TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.
TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.
TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.

Can I use TensorZero with ___?

Yes. Every major programming language is supported. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM provider.

Is TensorZero production-ready?

Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today.

Here's a case study: Automating Code Changelogs at a Large Bank with LLMs

How much does TensorZero cost?

TensorZero (LLMOps platform) is 100% self-hosted and open-source.

TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by TensorZero.

Who is building TensorZero?

Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our $7.3M seed round announcement and coverage from VentureBeat. We're hiring in NYC.

How do I get started?

You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.

Get Started

Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero

This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

Agentic RAG — Multi-Hop Question Answering with LLMs

This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Image Data Extraction — Multimodal (Vision) Fine-tuning

This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Blog Posts

We write about LLM engineering on the TensorZero Blog. Here are some of our favorite posts:

Release History

Version	Changes	Urgency	Date
2026.6.0	> [!CAUTION] > Security Advisory > > This release fixed a high-risk vulnerability affecting the TensorZero Gateway. > > Please refer to the security advisory for more details: https://github.com/tensorzero/tensorzero/security/advisories/GHSA-824w-x939-6cmc	High	6/4/2026
2026.5.2	New Features - Accept both strings and array of strings for `stop` in the OpenAI-compatible inference endpoint (thanks @pragnyanramtha). - Emit additional OpenInference attributes for Arize compatibility.	High	5/20/2026
2026.5.1	Bug Fixes - Treat SSE body decoding errors as fatal.	High	5/15/2026
2026.5.0	> [!CAUTION] > Breaking Changes > > - The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage. New Features - Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks). _& multiple under-the-hood and UI improvements (thanks @arisp)_	High	5/8/2026
2026.4.1	> [!CAUTION] > Breaking Changes > > - The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set `observability.async_writes = false`. [[docs]](https://www.tensorzero.com/docs/gateway/configuration-reference) > [!WARNING] > Deprecations > > - Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that	High	4/24/2026
2026.4.0	New Features - Add an MCP server to the gateway exposing its API in `/mcp`. - Report provider prompt caching statistics via API and UI. - Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI. - Add the Prometheus metrics `tensorzero_input_tokens_total` and `tensorzero_output_tokens_total`. - Add configuration field `content_type_overrides` to handle file inputs for long-tail providers. _& multiple under-the-hood and UI improvement	High	4/2/2026
2026.3.4	> [!WARNING] > Planned Deprecations > > - The configuration for inference evaluations should be nested under the relevant functions moving forward [[docs]](https://www.tensorzero.com/docs/evaluations/inference-evaluations/tutorial). You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release. > ``` > [functions.write_haiku.evaluators.exact_match] > type = "exact_match" > ``` > - The legacy implementa	Medium	3/26/2026
2026.3.3	Bug Fixes - Fixed two edge cases affecting batch inference. - Fixed a UI bug affecting "Try with..." with inputs that include base64 files. - Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic). New Features - Added an implementation of GEPA (automated prompt engineering) based on durable workflows. - Allow users to specify duplicate tool calls in `all_of` tool evaluators to evaluate parallel tool calling. - Allow users to specify an ex	Low	3/18/2026
2026.3.2	Bug Fixes - Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration. New Features - Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS. - Added the `openrouter::xxx` short-hand for embedding models. - Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabl	Low	3/13/2026
2026.3.1	> [!WARNING] > Completed Deprecations > > - Removed the deprecated `model_provider_name` filter for `extra_body` and `extra_headers`. Please use `model_name` and `provider_name` instead. > - Removed the legacy experimental `list_inferences` endpoint and method. Please use the new endpoint instead. [[docs]](https://www.tensorzero.com/docs/observability/query-historical-inferences) > - Removed several long-deprecated types and methods from the TensorZero Python SDK. > [!WARNING] >	Low	3/5/2026
2026.3.0	> [!WARNING] > Completed Deprecations > > - The deprecated Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` was removed. Use `tensorzero_inference_latency_overhead_seconds` instead. > [!WARNING] > Planned Deprecations > > - The configuration for experimentation (e.g. `static_weights`, `track_and_stop`) was simplified. The old notation will be removed in a future release. See **[Run adaptive A/B tests](https://www.tensorzero.com/docs/experimentation/run	Low	3/4/2026
2026.2.2	> [!CAUTION] > Breaking Changes > > - The `--config-file` globbing behavior has changed: single-level wildcards (``) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (``). This aligns the behavior with standard glob semantics. For example: > - `--config-file .toml` matches `tensorzero.toml`, but not `subdir/tensorzero.toml`. > - `--config-file */.toml` matches both `tensorzero.toml` and `subdir/tensorzero.to	Low	2/26/2026
2026.2.1	> [!CAUTION] > Breaking Changes > > - The default value for `cache_options.enabled` changed from `write_only` to `off`. New Features - Support reasoning models from Groq, Mistral, and vLLM. - Support multi-turn reasoning with Gemini and OpenAI-compatible models. - Support embedding models from Together AI. - Add configurable `total_ms` timeout to streaming inferences. - Display charts with top-k evaluation results in the TensorZero Autopilot UI. - Add "Ask Autopilot" button	Low	2/16/2026
2026.2.0	> [!WARNING] > Planned Deprecations > > - Anthropic's structured output feature is out of beta, so the TensorZero configuration field `beta_structured_outputs` is now ignored and deprecated. It'll be removed in a future release. Bug Fixes - Fix a regression in the `aws_bedrock` provider that affected long-term bearer API keys. - Fix a horizontal overflow issue for tool calls and results in the inference detail UI page. New Features - Add YOLO Mode for TensorZero Autop	Low	2/5/2026
2026.1.8	Bug Fixes - Fix a race condition in the TensorZero Autopilot UI that could disable the chat input. - Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations). _& multiple under-the-hood and UI improvements!_	Low	1/30/2026
2026.1.7	New Features - [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. [Learn more →](https://www.tensorzero.com/) [Join the waitlist →](https://tensorzero.com/autopilot-waitlist) - Support multi-turn reasoning for xAI (`reasoning_content` only). _& multiple under-the-hood and UI improvements!_	Low	1/30/2026
2026.1.6	> [!CAUTION] > Breaking Changes > > - Moving forward, TensorZero will use the OpenAI API's error format (`{"error": {"message": "Bad!"}`) instead of TensorZero's error format (`{"error": "Bad!"}`) in the OpenAI-compatible endpoints. > [!WARNING] > Planned Deprecations > > - When using `unstable_error_json` with the OpenAI-compatible inference endpoint, use `tensorzero_error_json` instead of `error_json`. For now, TensorZero will emit both fields with identical data. The TensorZe	Low	1/30/2026
2026.1.5	> [!CAUTION] > Breaking Changes > > - TensorZero will normalize the reported `usage` from different model providers. Moving forward, `input_tokens` and `output_tokens` include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with `include_raw_usage`. > [!WARNING] > Planned Deprecations > > - Migrate `include_original_response` to `include_raw_re	Low	1/24/2026
2026.1.2	New Features - Support appending to arrays with `extra_body` using the `/my_array/-` notation. - Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio. _& multiple under-the-hood and UI improvements (thanks @ecalifornica!)_	Low	1/15/2026
2026.1.1	> [!WARNING] > Planned Deprecations > > - In a future release, the parameter `model` will be required when initializing `DICLOptimizationConfig`. The parameter remains optional (defaults to `openai::gpt-5-mini`) in the meantime. Bug Fixes - Stop buffering `raw_usage` when streaming with the OpenAI-compatible inference endpoint; instead, emit `raw_usage` as soon as possible, just like in the native endpoint. - Stop reporting zero usage in every chunk when streaming a cached infe	Low	1/14/2026
2026.1.0	> [!CAUTION] > Breaking Changes > > - The Prometheus metric `tensorzero_inference_latency_overhead_seconds` will report a histogram instead of a summary. You can customize the buckets using `gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets` in the configuration (default: 1ms, 10ms, 100ms). > [!WARNING] > Planned Deprecations > > - Deprecate the `TENSORZERO_CLICKHOUSE_URL` environment variable from the UI. Moving forward, the UI will query data through the ga	Low	1/10/2026
2025.12.6	> [!CAUTION] > Breaking Changes > > - Migrated the following optimization fields from the TensorZero Python SDK to the configuration: > - `DICLOptimizationConfig`: removed `credential_location`. > - `FireworksSFTConfig`: moved `account_id` to `[provider_types.fireworks.sft]`; removed `api_base` and `credential_location`. > - `GCPVertexGeminiSFTConfig`: moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to to `[prov	Low	12/26/2025
2025.12.5	> [!WARNING] > Planned Deprecations > > - The variant type `experimental_chain_of_thought` will be deprecated in `2026.2+`. As reasoning models are becoming prevalent, please use their native reasoning capabilities. > - The `timeout_s` configuration field for best/mixture-of-N variants will be deprecated in `2026.2+`. Please use the `[timeouts]` block in the configuration for their candidates instead. New Features - Expand the dataset builder in the UI to support complex querie	Low	12/23/2025
2025.12.3	Bug Fixes - Fix a bug where negative tag filters (e.g. `user_id != 1`) matched inferences and datapoints without that tag. - Fix a bug where metric filters covering default values (e.g. `exact_match = false`) matched inferences without that metric. - Fix a regression affecting the logger in the UI. New Features - Improve the performance of the inference and datapoint list pages in the UI. - Support filtering inferences by whether they have a demonstration. _& multiple unde	Low	12/17/2025
2025.12.2	Bug Fixes - Fix a performance regression affecting the inference table in the UI. New Features - Allow users to customize the log level in the UI (`TENSORZERO_UI_LOG_LEVEL`). _& multiple under-the-hood and UI improvements_	Low	12/12/2025
2025.12.1	Bug Fixes - Fixed a regression that broke the dataset builder in the UI. _& multiple under-the-hood and UI improvements_	Low	12/12/2025
2025.12.0	> [!CAUTION] > Breaking Changes > > - Unknown content blocks now return the scope as `model_name` and `provider_name` instead of the fully-qualified `model_provider_name`. > [!WARNING] > Planned Deprecations > > - The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables `TENSORZERO_UI_CONFIG_PATH` and `TENSORZERO_UI_DEFAULT_CONFIG` are deprecated and ignored. You no longer need to mount the config	Low	12/11/2025
2025.11.6	Bug Fixes - Handle a regression in ClickHouse `latest` that affected the endpoint for deleting datapoints. New Features - Support running evaluations programmatically on specific datapoints (`datapoint_ids`). - Generate `values.schema.json` for the Helm chart. (thanks @Erin-Boehmer!)	Low	11/27/2025
2025.11.5	> [!CAUTION] > Breaking Changes > > - Moving forward, explicit `tensorzero::params` will take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint. > [!WARNING] > Planned Deprecations > > - Rename `json_mode="implicit_tool"` to `json_mode="tool"`. > - Set `model_name` and optionally `provider_name` instead of `model_provider_name` in `extra_body` and `extra_headers` objects supplied at inference time. Alternatively, don't include a s	Low	11/21/2025
2025.11.4	> [!CAUTION] > Breaking Changes > > - Moving forward, `allowed_tools` must include dynamic tools (tools specified at inference time rather than in configuration). This matches the OpenAI API behavior. Previously, TensorZero assumed that dynamic tools were always allowed. > [!WARNING] > Planned Deprecations > > - Use `limit` instead of `page_size` with the programmatic observability methods. Previously, the methods mixed these two fields. > - Don't nest fields in `metadata` or `	Low	11/19/2025
2025.11.3	Bug Fixes - Enable TLS support for Postgres connections. - Fix handling of user-defined tags in batch inference. _& multiple under-the-hood and UI improvements_	Low	11/11/2025
2025.11.2	> [!CAUTION] > Breaking Changes > > - Moving forward, the gateway will attempt any `fallback_variants` in order rather than randomly sample them. Bug Fixes - Fix a bug that prevented some model inferences from being rendered correctly in the UI. - Handle non-image base64 file inputs consistently in the OpenAI-compatible inference endpoint. - Handle `raw_response` correctly for batch inference with GCP Vertex AI Gemini. New Features - Apply the `tensorzero::api_key_pu	Low	11/6/2025
2025.11.1	Bug Fixes - Fix a regression that prevented batch inferences from being rendered in the UI. - Handle missing Postgres credentials gracefully in the UI. New Features - Support rate limiting by API key (`api_key_public_id`). - Add native `service_tier` inference parameter (supported providers: Anthropic, Azure, Groq, OpenAI). `extra_body` is no longer necessary. - Add native `detail` parameter for input images (supported providers: Azure, OpenAI, xAI). `extra_body` is no longer	Low	11/5/2025
2025.11.0	> [!WARNING] > Completed Deprecations > > - Completed the planned deprecation of the configuration field `enable_template_filesystem_access` in favor of `template_filesystem_access.enabled`. Bug Fixes - Handle the `global` region correctly for GCP Vertex Anthropic. - Fix `output` format for JSON functions in the new endpoint for updating datapoints (`PATCH /v1/{dataset_name}/datapoints`). The `output` field now matches the inference endpoint (an object with a `raw` field; `pars	Low	11/3/2025
2025.10.9	> [!CAUTION] > Notice on `2025.10.8`: We ran into a technical issue during the release process for `2025.10.8` that resulted in a broken build for the TensorZero Python SDK on PyPI. We've yanked that release and recommend upgrading to this version. > [!CAUTION] > Breaking Changes > > - This release includes small breaking changes to the programmatic observability/dataset APIs (e.g. `list_datapoints`, `experimental_list_inferences`) and the underlying data schema. Moving forward, T	Low	10/31/2025
2025.10.7	> [!CAUTION] > Breaking Changes > > - The default value for `fetch_and_encode_input_files_before_inference` is changing from `true` to `false`. As a result, the gateway will no longer fetch input files before inference, but instead will fetch them in parallel with inference (for observability). In rare cases, this may cause the gateway to receive different input files than those received by model providers. > [!WARNING] > Planned Deprecations > > - Migrate file content blocks fr	Low	10/23/2025
2025.10.6	> [!WARNING] > Planned Deprecations > > - We're renaming "static evaluations" to "inference evaluations" and "dynamic evaluations" to "workflow evaluations". The only action needed is to update `type = "static"` in the configuration to `type = "inference"`. Both versions will be supported until `2026.2+`. Bug Fixes - Fix a bug that dropped tool IDs in output `tool_call` content blocks when updating datapoints. - Prefer magic bytes over the `Content-Type` HTTP response header to	Low	10/21/2025
2025.10.5	Bug Fixes - Add `FinishReason.STOP_SEQUENCE` to the TensorZero Python SDK.	Low	10/20/2025
2025.10.4	> [!WARNING] > Planned Deprecations > > - The `bulk_insert_datapoints` method (`POST /datasets/{dataset_name}/datapoints/bulk`) will be renamed to `create_datapoints` (`POST /datasets/{dataset_name}/datapoints`). Both methods will be available until `2026.2+`. (thanks @BrianLi23!) > [!WARNING] > Completed Deprecations > > - Concluded many small ongoing deprecations: > > - Python SDK: renamed the types `InferenceDataset` → `InferenceDatapoint` and `Node` → `Filter` > -	Low	10/17/2025
2025.10.3	Bug Fixes - Fix bug in the Playground UI that caused inferences containing static tools with custom names (`tools.my_tool.name`) to fail.	Low	10/11/2025
2025.10.2	> [!WARNING] > Planned Deprecations > > - Currently, the gateway automatically includes all dynamic tools in the list of allowed tools. In a near-future release, dynamic tools will no longer be included automatically. If you intend for your dynamic tools to be allowed, please allow them explicitly. > [!WARNING] > Completed Deprecations > > - Finish renaming `datapoint_name` → `task_name` for dynamic evaluations. > - Stop including `--config-file` in the `Dockerfile` for `tensor	Low	10/10/2025
2025.10.1	New Features - Increase default body limit to 100MB for `patch_openai_client`. _& multiple under-the-hood and UI improvements_	Low	10/4/2025
2025.10.0	> [!WARNING] > Planned Deprecations > > - Configure timeouts for embedding models and embedding model providers with `timeout_ms` instead of `timeouts.non_streaming.total_ms`. The latter will be removed in a future release (`2026.1+`). > - Use the gateway CLI flags `--run-clickhouse-migrations` and `--run-postgres-migrations` instead of `--run-migrations-only`. `--run-migrations-only` requires credentials for both databases, even though Postgres is an optional dependency, so it will be r	Low	10/2/2025
2025.9.6	Bug Fixes - Implemented a workaround for an upstream bug in `opentelemetry-otlp` that caused our OTLP exporter to fail to send data to encrypted endpoints. New Features - Added multiple small improvements to the evaluations UI to streamline common workflows and simplify debugging. _& multiple under-the-hood and UI improvements_	Low	9/29/2025
2025.9.5	New Features - Add model observability page to the UI with model throughput and latency analytics. - Add support for OpenInference format when exporting OpenTelemetry traces. - Expand support of UI features for the default function (e.g. "Try with model"). - Add support for supervised fine-tuning (SFT) with GCP Vertex AI Gemini in the UI. - Improve the performance of episode table in the UI. - Add an example of using the programmatic workflow for dynamic in-context learning. _& mu	Low	9/25/2025
2025.9.4	> [!WARNING] > Planned Deprecations > > - Rename types from `Dicl` to `DICL` in the Python SDK for consistency. Both versions work for now, and the deprecated types will be removed in a future release (`2025.12+`). Bug Fixes - Fix a regression in the UI that prevented `chat` datapoints from being edited. New Features - Expand the prompt templates and schemas functionality to support unlimited templates per function. - Support appending to existing DICL variants in t	Low	9/16/2025
2025.9.3	New Features - Add support for dynamic OTLP headers when exporting OpenTelemetry traces. - Add support for `allowed_tools` field in the OpenAI-compatible inference endpoint. - Improve performance by automatically adjusting the number of HTTP2 connections to model providers based on concurrency. _& multiple under-the-hood and UI improvements (thanks @yuria-loo!)_	Low	9/12/2025
2025.9.1	Bug Fixes - Fix a regression that prevented rendering of inferences with `thought` content blocks in the UI. - Stop logging HTTP requests and responses twice in debug mode. New Features - Add a programmatic API for reinforcement fine-tuning (RFT) with OpenAI. - Provide defaults for individual fields in the `retries` configuration. - Allow users to specify the Azure provider endpoint dynamically. (thanks @Dineshm-coder!) - Improve error messages when the gateway is missing cr	Low	9/8/2025
2025.9.0	> [!CAUTION] > > Breaking Changes > > - The bug fix for `feedback_id` technically introduces a breaking change in the TensorZero Python SDK. The field is no longer incorrectly doubly nested and now matches the SDK's type annotations. > [!WARNING] > Completed Deprecations > > - `json_mode` is now required for JSON function variants. Bug Fixes - Added workarounds for two ClickHouse regressions (ClickHouse/ClickHouse#86415, ClickHouse/ClickHouse#86557) introduced in Clic	Low	9/3/2025
2025.8.5	Bug Fixes - Reduce the ClickHouse memory footprint in large deployments with human feedback for evaluations. New Features - Add a programmatic optimization interface for dynamic in-context learning. - Expose more hyperparameters for programmatic supervised fine-tuning with Together AI. _& many under-the-hood and UI improvements (thanks @quangIO!)_	Low	8/29/2025
2025.8.4	> [!WARNING] > Planned Deprecations > > * The OpenAI-compatible embeddings endpoint will require the prefix `tensorzero::embedding_model_name::` for model names (e.g. `tensorzero::embedding_model_name::openai::text-embedding-3-small`). Support for unprefixed names will be removed in a future release (`2025.12+`). Bug Fixes - Fix a ClickHouse warning that occurred when a model inference had input tokens set to null and output tokens non-null, or vice versa. This issue only caused	Low	8/27/2025
2025.8.3	> [!CAUTION] > Breaking Changes > > * Temporarily removing support for batching writes to ClickHouse with the embedded gateway in Python: In the previous release, we added support for batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale (default off). Later, we discovered that in rare scenarios, the Python GIL could interfere with this setting in embedded clients and cause a deadlock. While we investigate a solution, we are removing support for	Low	8/21/2025
2025.8.2	New Features - Add a Playground to the UI to compare variants side-by-side, iterate on prompts quickly, and replay inference requests. - Support batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale. - Add a Jupyter notebook recipe for supervised fine-tuning with Unsloth. _& many under-the-hood and UI improvements (thanks @contrun @lblack00!)_	Low	8/12/2025
2025.8.1	New Features * Add an OpenAI-compatible endpoint for embeddings, with support for OpenAI (& OpenAI-compatible) and Azure OpenAI Service model providers. * Add support for self-hosted replicated ClickHouse databases. * Parse `reasoning_content` from Fireworks and vLLM model providers. * Improve error messages for AWS Bedrock and AWS SageMaker model providers. Bug Fixes * Allow configuration to specify `description` for JSON functions. * Fix a regression where function descrip	Low	8/11/2025
2025.8.0	New Features - Add `gateway.observability.skip_completed_migrations` configuration option to reduce gateway startup time and database load. When enabled, the gateway will skip running the ClickHouse migration workflow (i.e. verifying and potentially applying every migration) on startup for migrations that are already present in a database table that tracks migration history. - Support `raw_text` content blocks in the OpenAI-compatible inference endpoint. (Thanks @hongantran3804 @pykm05 @	Low	8/6/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

hubHigh-scale LLM gateway, written in Rust. OpenTelemetry-based observability included0.10.1

vectro⚡💾 Vectro — Compress LLM embeddings 🧠🚀 Save memory, speed up retrieval, and keep semantic accuracy 🎯✨ Lightning-fast quantization for Python + Mojo, vector DB friendly 🗄️, and perfect for RAG pipv4.8.0

ai-engineering-from-scratchLearn it. Build it. Ship it for others.v2026.07

llm7.ioLLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.main@2026-07-25

awesome-opensource-aiCurated list of the best truly open-source AI projects, models, tools, and infrastructure.main@2026-07-25

More in Infrastructure

llm7.ioLLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.

modelsThis repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.

control-layerThe world’s fastest AI model gateway (450x less overhead than LiteLLM). Unified access to LLMs across endpoints (openAI, self-hosted, etc.) behind a single authentication layer - with API key generati

chak-aiA simple, yet handy, LLM gateway.