freshcrate
Home > RAG & Memory > xgrammar

xgrammar

Efficient, Flexible and Portable Structured Generation

Description

<div align="center" id="top"> <img src="https://raw.githubusercontent.com/mlc-ai/xgrammar/main/assets/logo.svg" alt="logo" width="400" margin="10px"></img> [![Documentation](https://img.shields.io/badge/docs-latest-green)](https://xgrammar.mlc.ai/docs/) [![License](https://img.shields.io/badge/license-apache_2-blue)](https://github.com/mlc-ai/xgrammar/blob/main/LICENSE) [![PyPI](https://img.shields.io/pypi/v/xgrammar)](https://pypi.org/project/xgrammar) [![PyPI Downloads](https://static.pepy.tech/badge/xgrammar)](https://pepy.tech/projects/xgrammar) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/mlc-ai/xgrammar) **Efficient, Flexible and Portable Structured Generation** [Get Started](#get-started) | [Documentation](https://xgrammar.mlc.ai/docs/) | [Blogpost](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar) | [Technical Report](https://arxiv.org/abs/2411.15100) </div> ## News - [2025/12] XGrammar has been officially integrated into [Mirai](https://github.com/trymirai/uzu) - [2025/09] XGrammar has been officially integrated into [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) - [2025/02] XGrammar has been officially integrated into [Modular's MAX](https://docs.modular.com/max/serve/structured-output) - [2025/01] XGrammar has been officially integrated into [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). - [2024/12] XGrammar has been officially integrated into [vLLM](https://github.com/vllm-project/vllm). - [2024/12] We presented research talks on XGrammar at CMU, UC Berkeley, MIT, THU, SJTU, Ant Group, LMSys, Qingke AI, Camel AI. The slides can be found [here](https://docs.google.com/presentation/d/1iS7tu2EV4IKRWDaR0F3YD7ubrNqtGYUStSskceneelc/edit?usp=sharing). - [2024/11] XGrammar has been officially integrated into [SGLang](https://github.com/sgl-project/sglang). - [2024/11] XGrammar has been officially integrated into [MLC-LLM](https://github.com/mlc-ai/mlc-llm). - [2024/11] We officially released XGrammar v0.1.0! ## Overview XGrammar is an open-source library for efficient, flexible, and portable structured generation. It leverages constrained decoding to ensure **100% structural correctness** of the output. It supports general context-free grammar to enable a broad range of structures, including **JSON**, **regex**, **custom context-free grammar**, etc. XGrammar uses careful optimizations to achieve extremely low overhead in structured generation. It has achieved **near-zero overhead** in JSON generation, making it one of the fastest structured generation engines available. XGrammar features **universal deployment**. It supports: * **Platforms**: Linux, macOS, Windows * **Hardware**: CPU, NVIDIA GPU, AMD GPU, Apple Silicon, TPU, etc. * **Languages**: Python, C++, and JavaScript APIs * **Models**: Qwen, Llama, DeepSeek, Phi, Gemma, etc. XGrammar is very easy to integrate with LLM inference engines. It is the default structured generation backend for most LLM inference engines, including [**vLLM**](https://github.com/vllm-project/vllm), [**SGLang**](https://github.com/sgl-project/sglang), [**TensorRT-LLM**](https://github.com/NVIDIA/TensorRT-LLM), and [**MLC-LLM**](https://github.com/mlc-ai/mlc-llm), as well as many other companies. You can also try out their structured generation modes! ## Get Started Install XGrammar: ```bash pip install xgrammar ``` For use with MPS on Apple Silicon, install with: ```bash pip install "xgrammar[metal]" ``` Import XGrammar: ```python import xgrammar as xgr ``` Please visit our [documentation](https://xgrammar.mlc.ai/docs/) to get started with XGrammar. - [Installation](https://xgrammar.mlc.ai/docs/start/installation) - [Quick start](https://xgrammar.mlc.ai/docs/start/quick_start) ## Third-Party Bindings - **Rust**: [xgrammar-rs](https://github.com/trymirai/xgrammar-rs) — Community Rust bindings for XGrammar. ## Collaborators XGrammar has been widely adopted in industry, open-source projects, and academia. Our collaborators include: <div align="center"> [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/nvidia.svg" height=50/>](https://github.com/NVIDIA/TensorRT-LLM) &emsp; [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/databricks.svg" height=50/>](https://www.databricks.com/) &emsp; [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/meta.svg" height=50/>](https://about.meta.com/) &emsp; [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/google.svg" height=50/>](https://about.google/) &emsp; [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/xai.png" height=50/>](https://github.com/NVIDIA/TensorRT-LLM) &emsp; [<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/deepseek.png" height=50/>](http

Release History

VersionChangesUrgencyDate
0.1.33Imported from PyPI (0.1.33)Low4/21/2026
v0.1.33## What's Changed * refactor: simplify TagDispatch by removing stop_eos and stop_str by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/554 * feat: provide `PlusFormat`, `OptionalFormat`, `StarFormat` to enhance `StructuralTag`. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/557 * feat: support `structural_tag`-level cache. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/553 * feat: token-level grammar support with Token/ExcludeToken/TokenTagDispatch edges by Medium3/27/2026
v0.1.32## What's Changed * Third-party rust support integration by @eugenebokhan in https://github.com/mlc-ai/xgrammar/pull/531 * feat: support crossing-grammar cache. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/526 * refactor: refactor the structure of structural_tag for better extensibility. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/528 * fix: fix the doc building. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/533 * refactor: refactor the strucLow3/4/2026
v0.1.31v0.1.30 will be yanked soon for the issues in `apply_token_bitmask_inplace` and the Windows OS compatibility of crossing-grammar caching. Please use v0.1.31 instead. ## What's Changed * fix the apply_token_bit_mask with auto. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/521 * Revert "feat: support crossing-grammar cache. (#508)" by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/522 * Bump to v0.1.31 by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/523Low1/19/2026
v0.1.30## What's Changed * [Feature] Allow empty separator string for `tags_with_separator` by @ricohasgit in https://github.com/mlc-ai/xgrammar/pull/503 * feat: provide excluded strs for any_text and tagdispatch. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/502 * chore: update cibuildwheel to build more wheels. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/509 * [BugFix] Fix error `TypeError: apply_token_bitmask_inplace_cpu(): incompatible function arguments` by @wjuLow1/17/2026
v0.1.29## What's Changed * Add TraverseDraftTree for speculative decoding bitmask generation by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/490 * [Web] Support structural tag in web-xgrammar package, upgrade version to 0.1.27 by @akaashrp in https://github.com/mlc-ai/xgrammar/pull/482 * Fix warp_size in triton kernel for AMD GPUs by @divakar-amd in https://github.com/mlc-ai/xgrammar/pull/476 * [Fix] Fix the tokenizer VocabType dection for Kimi-K2-Instruct. (#483) by @eraser00 in https://giLow12/19/2025
v0.1.28## What's Changed * Fix web emscripten bindings, bump to 0.1.26 by @loganzartman in https://github.com/mlc-ai/xgrammar/pull/456 * fix(build): Disable LTO on riscv64 to fix linker error by @ihb2032 in https://github.com/mlc-ai/xgrammar/pull/458 * remove ninja from dependencies by @dotlambda in https://github.com/mlc-ai/xgrammar/pull/459 * [CI] Add a ci to run unit tests every day. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/463 * [Fix] Fix the efficiency issue in repetition Low12/9/2025
v0.1.27## What's Changed * [Fix] Fix JsonSchemaConverter for numbers "-0. ..." by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/462 * [test] Add a build test for Web-XGrammar. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/457 * [CI] Fix web test. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/464 * Bump to v0.1.27 by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/466 **Full Changelog**: https://github.com/mlc-ai/xgrammar/compare/v0.1.26...v0.1.27Low11/4/2025
v0.1.26v0.1.26 brings a series of batched methods for token mask generation. This version also fixed several issues with improvements in efficiency. ## What's Changed * Update the REMDME section. by @Ubospica * [Refac] Refactor the pipeline. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/428 * [Feature] support openGPT-x tokenizer. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/446 * [Refac] Refactor repetition expressions to reduce the uncertainty. by @Seven-StreamsLow10/20/2025
v0.1.25v0.1.25 brings the structural tag. It also fixes a couple of problems regarding JSON schema and py.typed. It enhances the speed of grammar compilation and mask generation. ## What's Changed * Json schema generation, limit the number of whitespaces by @ExtReMLapin in https://github.com/mlc-ai/xgrammar/pull/414 * [Fix] Fix the failure to build the website. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/415 * Update README.md to announce that OpenVINO GenAI uses XGrammar by @apaLow9/21/2025
v0.1.24This verision brings a lot of bugfixes. It also optimizes the speed for the repeat grammar, especially the repeat number is large. ## What's Changed * [Fix] Fix the print function of AdaptiveTokenMask. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/404 * [Fix]fix the broken link of the website. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/408 * [Optim] Optim repetition expressions when the max repetition time is unbounded. by @Seven-Streams in https://github.cLow9/4/2025
v0.1.23## Highlights * Significant speedup for grammars with repeat, e.g., `a{1, 100}`. Now the preprocessing of it is O(1) instead of O(n) * Release the new serialization library * Fix bugs about max_rollback_tokens * Fix bugs in cuda kernels * Refactor: migrate the grammar backend to FSMs ## What's Changed * [Refactor] JSON serializer and MemorySize by @DarkSharpness in https://github.com/mlc-ai/xgrammar/pull/380 * [Feature] Add a new expression to represent repetition to speed up. by @SevLow8/15/2025
v0.1.22## New Version Highlights * Enhanced Earley Parser with FSM Support - Added FSM support to the Earley parser and improved TagDispatch intrinsic * Support incontiguous Logits and Bitmask * JSON Schema Enhancements: Support `minProperties`, `maxProperties`, `patternProperties`, `propertyNames` * Earley Parser Bug Fixes - Resolved boundary check issues and behavioral inconsistencies in the Earley parser that could affect parsing accuracy * Improved Debugging Experience ## What's Changed * Low7/27/2025
v0.1.21## Summary v0.1.21 brings significant performance improvements and resolves previous issues with infinite loops and system freezes. It introduces the Earley parser. The preprocessing time is now up to six times faster than before (See #308), and it no longer suffers from the previous problems of exponential growth in the number of states or infinite (or very deep) recursion. It also fixes the cache corruption problem that occurs when the input is invalid in GrammarCompiler. ## What's ChLow7/10/2025
v0.1.20## Summary - Fix Windows build flags for Torch extensions - Refactor the Parser and FSM libraries for improved maintainability - Expose accept_string and print_internal_states with configurable max recursion depth - Correct acceptance of certain invalid characters in JSON strings - Make tiktoken and sentencepiece optional dependencies - Support int64 values in JSON Schema conversion - Add a C++ reflection-based JSON serialization method ## What's Changed * Fix torch extension build cfLow6/30/2025
v0.1.19## What's Changed * Support minItems, maxItems for array by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/296 * [fix]fix the handling for '\?'. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/305 * [Ci] Add a check in benchmark.yaml. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/301 * [CI Fix] Fix the check for benchmark. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/309 * [Feature] Rewrite the FSM.h to support some regex grammar. by @Seven-Low5/8/2025
v0.1.18## Highlights * Provides a MLX kernel to support mac devices * Fix bugs in json schema converter * Fix the compilation failure issue of torch kernels * Supports float range, format in json schema * Adds a function testing._is_single_token_bitmask for spec decoding * Releases GIL for most of the methods ## What's Changed * Bump to v0.1.17 by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/271 * [Feature] LRU cache for grammar compiler by @DarkSharpness in https://github.com/mlLow4/8/2025
v0.1.17## What's Changed * Update README.md by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/246 * [Feature] Add CI to close inactive issues and check formats. by @Seven-Streams in https://github.com/mlc-ai/xgrammar/pull/241 * Enable editable installation by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/247 * Cleanup useless build scripts by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/248 * fix(python): use correct name for pybind11 FindPython mode by @henryiii in https://giLow3/25/2025
v0.1.16## What's Changed * [Fix] Postpone cuda import to the calling site by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/231 * [Feature] Support model vocab size being less than tokenizer by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/237 * [Style] Remove unused headers by @DarkSharpness in https://github.com/mlc-ai/xgrammar/pull/219 * Fallback to triton if we fail to compile for CUDA by @zbowling in https://github.com/mlc-ai/xgrammar/pull/223 * [Feature] Build and run C++ PythonLow3/15/2025
v0.1.15## What's Changed * Update README.md by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/225 * [Feature] Expose Grammar.union by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/227 * [Feature] Support Deepseek R series models by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/226 * Bump to v0.1.15 by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/228 **Full Changelog**: https://github.com/mlc-ai/xgrammar/compare/v0.1.14...v0.1.15Low3/5/2025
v0.1.14## Highlights This version supports XGrammar on Linux Arm64, removes the restriction of glibc 2.28, and adds a source distribution. It fixes the self-recursion error, extends apply_token_mask_inplace to support some corner cases. It changes the API of StructuralTag, renaming field "start" to "begin". ## What's Changed * update GenerateRangeRegex by @zanderjiang in https://github.com/mlc-ai/xgrammar/pull/182 * [Fix] Fix compatibility for apply_token_bitmask_inplace by @Ubospica in hLow2/26/2025
v0.1.13## Highlight This version enhances the compatibility of XGrammar on various platforms. It also provides full support for regex. Now most features are supported. It also enhances the efficiency of the token bitmask application kernel. ## What's Changed * [Fix] Fix popcount for windows by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/167 * [Fix] Rollback safely when token acceptance fails by @benchislett in https://github.com/mlc-ai/xgrammar/pull/164 * fix(fsm): fix error of dangliLow2/13/2025
v0.1.11## Highlight In this PR we supported the structural tag. This is a new feature that can support strict function calling (and many more flexible patterns). ## What's Changed * [Feature] Structural tag by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/162 * [Fix] Fix #162 by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/163 * [Fix] Fix broken rollback char count after accepting stop token by @benchislett in https://github.com/mlc-ai/xgrammar/pull/161 * [Feature] Optional TokenLow2/13/2025
v0.1.10## Highlight In this version we enhanced the ability of json schema, ebnf, and provided APIs for grammar concat and union. ## What's Changed * [Feature] Support regex and repetition range by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/144 * [Refactor] Rename internal classes for better structure by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/145 * [FunctionCalling] Support TagDispatch by @Ubospica in https://github.com/mlc-ai/xgrammar/pull/146 * [Fix] Fix doc dependenLow2/13/2025
v0.1.8## Features - Enhance JSON Schema converter by @Ubospica in #134 - Enhance EBNF Parser by @observerw in #125 - Support sentencepiece tokenizer by @zanderjiang in #120 - Enhance ApplyMask kernels to provide better support in mix-structured-and-unstructured cases by @Ubospica in #128 Low12/25/2024
v0.1.6## Features - The ability of the JSON Schema converter is enhanced to support integer range and regex pattern. Thanks @joennlae ## Bug Fixes - Solves the problem of strict JSON format degrading the LLM output qualityLow12/7/2024
v0.1.4This is the stable release version of XGrammar. It provides efficient and portable API for LLM structured generation.Low11/25/2024

Dependencies & License Audit

Loading dependencies...

Similar Packages

apache-tvm-ffitvm ffi0.1.10
outlines-coreStructured Text Generation in Rust0.2.14
magikaA tool to determine the content type of a file with deep learning1.0.2
nvidia-cuda-cupti-cu12CUDA profiling tools runtime libs.12.9.79
gymnasiumA standard API for reinforcement learning and a diverse set of reference environments (formerly Gym).1.2.3