# sglang

> SGLang is a fast serving framework for large language models and vision language models.

- **URL**: https://www.freshcrate.ai/projects/sglang
- **Author**: pypi
- **Category**: Frameworks
- **Latest version**: `v0.5.12.post1` (2026-05-26)
- **License**: non-standard
- **Source**: https://github.com/sgl-project/sglang/issues
- **Homepage**: https://pypi.org/project/sglang/
- **Language**: Python
- **GitHub**: 26,220 stars, 5,484 forks
- **Registry**: pypi (`sglang`)
- **Tags**: `pypi`

## Description

<div align="center" id="sglangtop">
<img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400" margin="10px"></img>

[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
![PyPI - Downloads](https://static.pepy.tech/badge/sglang?period=month)
[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/sgl-project/sglang)

</div>

--------------------------------------------------------------------------------

<p align="center">
<a href="https://lmsys.org/blog/"><b>Blog</b></a> |
<a href="https://docs.sglang.io/"><b>Documentation</b></a> |
<a href="https://roadmap.sglang.io/"><b>Roadmap</b></a> |
<a href="https://slack.sglang.io/"><b>Join Slack</b></a> |
<a href="https://meet.sglang.io/"><b>Weekly Dev Meeting</b></a> |
<a href="https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides"><b>Slides</b></a>
</p>

## News
- [2026/02] 🔥 Unlocking 25x Inference Performance with SGLang on NVIDIA GB300 NVL72 ([blog](https://lmsys.org/blog/2026-02-20-gb300-inferencex/)).
- [2026/01] 🔥 SGLang Diffusion accelerates video and image generation ([blog](https://lmsys.org/blog/2026-01-16-sglang-diffusion/)).
- [2025/12] SGLang provides day-0 support for latest open models ([MiMo-V2-Flash](https://lmsys.org/blog/2025-12-16-mimo-v2-flash/), [Nemotron 3 Nano](https://lmsys.org/blog/2025-12-15-run-nvidia-nemotron-3-nano/), [Mistral Large 3](https://github.com/sgl-project/sglang/pull/14213), [LLaDA 2.0 Diffusion LLM](https://lmsys.org/blog/2025-12-19-diffusion-llm/), [MiniMax M2](https://lmsys.org/blog/2025-11-04-miminmax-m2/)).
- [2025/10] 🔥 SGLang now runs natively on TPU with the SGLang-Jax backend ([blog](https://lmsys.org/blog/2025-10-29-sglang-jax/)).
- [2025/09] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput ([blog](https://lmsys.org/blog/2025-09-25-gb200-part-2/)).
- [2025/09] SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention ([blog](https://lmsys.org/blog/2025-09-29-deepseek-V32/)).
- [2025/08] SGLang x AMD SF Meetup on 8/22: Hands-on GPU workshop, tech talks by AMD/xAI/SGLang, and networking ([Roadmap](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_sglang_roadmap.pdf), [Large-scale EP](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_sglang_ep.pdf), [Highlights](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_highlights.pdf), [AITER/MoRI](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_aiter_mori.pdf), [Wave](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_wave.pdf)).

<details>
<summary>More</summary>

- [2025/11] SGLang Diffusion accelerates video and image generation ([blog](https://lmsys.org/blog/2025-11-07-sglang-diffusion/)).
- [2025/10] PyTorch Conference 2025 SGLang Talk ([slide](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/sglang_pytorch_2025.pdf)).
- [2025/10] SGLang x Nvidia SF Meetup on 10/2 ([recap](https://x.com/lmsysorg/status/1975339501934510231)).
- [2025/08] SGLang provides day-0 support for OpenAI gpt-oss model ([instructions](https://github.com/sgl-project/sglang/issues/8833))
- [2025/06] SGLang, the high-performance serving infrastructure powering trillions of tokens daily, has been awarded the third batch of the Open Source AI Grant by a16z ([a16z blog](https://a16z.com/advancing-open-source-ai-through-benchmarks-and-bold-experimentation/)).
- [2025/05] Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs ([blog](https://lmsys.org/blog/2025-05-05-large-scale-ep/)).
- [2025/06] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput ([blog](https://lmsys.org/blog/2025-06-16-gb200-part-1/)).
- [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html))
- [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine ([PyTorch blog](https://pytorch.org/blog/sglang-joins-pytorch/))
- [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1_Perf/README.html))
- [2025/01] SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. ([instructions](https://github.com/sgl-project/sglang/tree/main/benchma

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v0.5.12.post1` | 2026-05-26 | High | v0.5.12.post1 is a stability patch on top of v0.5.12. It cherry-picks 12 fixes — primarily for DeepSeek V4 — onto the release branch.  # Bug Fixes  ## DeepSeek V4 * DSV4-Pro emits garbled text during single-token decode on B200/B300 (fix `deep_gemm` UE8M0 scale-packing path by ceiling activation scales before packing): #25733 * DSV4 + EAGLE/MTP in disaggregation decode crashes around 2000 requests with a SWA allocator assertion (recycled KV pages kept stale sliding-window mappings): #25805 |
| `v0.5.12` | 2026-05-16 | High | # Highlights  - **DeepSeek V4 support**: Full inference path for DeepSeek-V4 (#23882), including:      Day-0 Features: #23882     - Parallelism: Tensor Parallelism/Expert Parallelism/Context Parallelism/Data Parallel Attention     - Hardware: Nvidia B300/B200/H200/H100/GB200/GB300, AMD MI35X     - Prefill-Decode Disaggregation     - HiSparse for offloading inactive KV cache to CPU memory     - Reasoning parser and Tool Call Parser     - DeepGemm and FlashMLA kernels for DeepSeek V4, in |
| `v0.5.11` | 2026-05-05 | High | # Highlights  - **CUDA 13 + Torch 2.11**: Default CUDA version moves to 13.0 across SGLang, sgl-kernel, and Docker images, and PyTorch is upgraded from 2.9 to 2.11 — modernizing the build matrix and unlocking newer kernels: #21247, #24162, #24183, #23593 ([tracking issue #21498](https://github.com/sgl-project/sglang/issues/21498))  - **Speculative Decoding V2 by default**: Spec V2 (with overlap scheduling to hide CPU overhead) is now the default, materially reducing per-step CPU cost for EAG |
| `0.5.10.post1` | 2026-04-21 | Low | Imported from PyPI (0.5.10.post1) |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |
| `v0.5.10.post1` | 2026-04-09 | Medium | **Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.5.10...v0.5.10.post1  Bumps flashinfer from v0.6.7.post2 to v0.6.7.post3 to resolve an issue in its jit cubin downloader. |

## Citation

- HTML: https://www.freshcrate.ai/projects/sglang
- Markdown: https://www.freshcrate.ai/projects/sglang.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/sglang/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._
