transformers

Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

deep-learning llm machine-learning nlp pypi python pytorch transformer vlm

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg"> <img alt="Hugging Face Transformers Library" src="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg" width="352" height="59" style="max-width: 100%;"> </picture> <br/> <br/> </p> <p align="center"> <a href="https://huggingface.com/models"><img alt="Checkpoints on Hub" src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen"></a> <a href="https://circleci.com/gh/huggingface/transformers"><img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/main"></a> <a href="https://github.com/huggingface/transformers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue"></a> <a href="https://huggingface.co/docs/transformers/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online"></a> <a href="https://github.com/huggingface/transformers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg"></a> <a href="https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg"></a> <a href="https://zenodo.org/badge/latestdoi/155220641"><img src="https://zenodo.org/badge/155220641.svg" alt="DOI"></a> </p> <h4 align="center"> <p> <b>English</b> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hans.md">简体中文</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hant.md">繁體中文</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ko.md">한국어</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_es.md">Español</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> | </p> </h4> <h3 align="center"> <p>State-of-the-art pretrained models for inference and training</p> </h3> <h3 align="center"> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/> </h3> Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer vision, audio, video, and multimodal models, for both inference and training. It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM

Release History

Version	Changes	Urgency	Date
v5.10.1	# Release v5.10.1 v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!! ## New Model additions ### Gemma4 unified+ Gemma4 MTP <img width="2000" height="400" alt="image" src="https://github.com/user-attachments/assets/5e3ee940-f78d-4343-ac7a-889930800aa6" /> Gemma 4 12B Unified is an encoder-free multimodal model with pretrained and instruction-tuned variants. Unlike [standard Gemma 4](./gemma4), which uses dedicated encoder	High	6/3/2026
v5.9.0	# Release v5.9.0 ## New Model additions ### Cohere2Moe Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences. Links: [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/cohere2_moe) * Add new cohere2_moe model (#4611	High	5/20/2026
v5.8.1	# Patch release v5.8.1 This release is mainly to fix the Deepseek V4 integration!!! <img width="714" height="774" alt="image" src="https://github.com/user-attachments/assets/0d85e891-a0ff-436e-a9d4-b6633096f2b5" /> * [fix] Add fatal_error to ContinuousBatchingManager so the serving... by @qgallouedec, @remi-or * Fix WeightConverter regex incorrectly matching shared_experts as experts by @silencelamb, @claude * Fix deepseek v4 by @ArthurZucker (#45892) * Deepseek v4 csa mask collaps	High	5/13/2026
v5.8.0	# Release v5.8.0 ## New Model additions ### DeepSeek-V4 <img width="6604" height="3574" alt="image" src="https://github.com/user-attachments/assets/4c0fdb29-f770-463c-a97b-d24438896a4c" /> DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections fo	High	5/5/2026
v5.7.0	# Release v5.7.0 ## New Model additions ### Laguna <img width="699" height="176" alt="image" src="https://github.com/user-attachments/assets/d3bae269-bea7-4ddf-a53f-d4718befdb17" /> Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing different decoder layers to have different query-head counts while sharing the same KV cache shape, and implements a sigmoid Mo	High	4/28/2026
v5.6.2	# Patch release v5.6.2 Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this :saluting_face: * Fix configuration reading and error handling for kernels (https://github.com/huggingface/transformers/pull/45610) by @hmellor Full Changelog: https://github.com/huggingface/transformers/compare/v5.6.1...v5.6.2	High	4/23/2026
v5.6.0	# Release v5.6.0 ## New Model additions ### OpenAI Privacy Filter OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi pr	High	4/22/2026
5.5.4	Imported from PyPI (5.5.4)	Low	4/21/2026
v5.5.4	# Patch release v5.5.4 This is mostly some fixes that are good to have asap, mostly for tokenizers; Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker For training: Fix #45305 + add regression test GAS (#45349) by florian6973, SunMarc Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#…) by ArthurZucker And for Qwen2.5-VL : Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchi	Medium	4/13/2026
v5.5.3	Small patch release to fix `device_map` support for Gemma4! It contains the following commit: - [gemma4] Fix device map auto (#45347) by @Cyrilvallez	Medium	4/9/2026
v5.5.2	Small patch dedicated to optimizing gemma4, fixing inference with `use_cache=False` due to k/v states sharing between layers, as well as conversion mappings for some models that would inconsistently serialize their weight names. It contains the following PRs: - Add MoE to Gemma4 TP plan (#45219) by @sywangyi and @Cyrilvallez - [gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez - [gemma4] Remove all shared weights, and silently skip them during loading (#45336) by	Medium	4/9/2026
v5.5.1	# Patch release v5.5.1 This patch is very small and focuses on vLLM and Gemma4! Fix export for gemma4 and add Integration tests (#45285) by @Cyrilvallez Fix vllm cis (#45139) by @ArthurZucker	Medium	4/9/2026
v5.5.0	# Release v5.5.0 <img width="2786" height="1504" alt="image" src="https://github.com/user-attachments/assets/6c8c878f-042b-4858-9f64-73fd9ccd7e4b" /> ## New Model additions ### Gemma4 [Gemma 4](INSET_PAPER_LINK) is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a sp	Medium	4/2/2026
v5.4.0	## New Model additions ### VidEoMT <img width="1480" height="460" alt="image" src="https://github.com/user-attachments/assets/bec6fc25-b0ab-4227-8c2b-a838554f37f3" /> Video Encoder-only Mask Transformer (VidEoMT) is a lightweight encoder-only model for online video segmentation built on a plain Vision Transformer (ViT). It eliminates the need for dedicated tracking modules by introducing a lightweight query propagation mechanism that carries information across frames and employs a query	Medium	3/27/2026
v5.3.0	## New Model additions ### EuroBERT <img width="1080" height="1080" alt="image" src="https://github.com/user-attachments/assets/33603f42-5435-421a-9641-baf72faacb22" /> EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens. Links: [Documentation](https://huggingface.co/docs/transformers/main/en/model_d	Low	3/4/2026
v5.2.0	## New Model additions ### VoxtralRealtime <img width="1920" height="1080" alt="image" src="https://github.com/user-attachments/assets/80e37670-6d70-402b-8c8e-ccfb8c32df2d" /> VoxtralRealtime is a streaming speech-to-text model from [Mistral AI](https://mistral.ai), designed for real-time automatic speech recognition (ASR). Unlike the offline [Voxtral](./voxtral) model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by proc	Low	2/16/2026
v5.1.0	## New Model additions ### EXAONE-MoE <img width="2278" height="1142" alt="image" src="https://github.com/user-attachments/assets/0c3d5341-0483-49c3-8467-f9784ec94b37" /> K-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agen	Low	2/5/2026
v5.0.0	## Transformers v5 release notes <img width="1800" height="1013" alt="image" src="https://github.com/user-attachments/assets/7b5187d7-6945-4108-a546-6d1d7bfb55e3" /> - Highlights - Significant API changes: dynamic weight loading, tokenization - Backwards Incompatible Changes - Bugfixes and improvements We have a migration guide that will be continuously updated available on the `main` branch, please check it out in case you're facing issues: [migration guide](https://github.com/huggi	Low	1/26/2026
v5.0.0rc3	# Release candidate v5.0.0rc3 ## New models: * [GLM-4.7] GLM-Lite Supoort by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43031 * [GLM-Image] AR Model Support for GLM-Image by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43100 * Add LWDetr model by @sbucaille in https://github.com/huggingface/transformers/pull/40991 * Add LightOnOCR model implementation by @baptiste-aubertin in https://github.com/huggingface/transformers/pull/41621 ## Wha	Low	1/26/2026
v4.57.6	## What's Changed Another fix for qwen vl models that prevented correctly loading the associated model type - this works together with https://github.com/huggingface/transformers/pull/41808 of the previous patch release. * Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again by @i3hz in https://github.com/huggingface/transformers/pull/41758 Full Changelog: https://github.com/huggingface/transformers/compare/v4.57.5...v4.57.6	Low	1/16/2026
v4.57.5	## What's Changed Should not have said last patch :wink: These should be the last remaining fixes that got lost in between patches and the transition to v5. * QwenVL: add skipped keys in setattr as well by @zucchini-nlp in https://github.com/huggingface/transformers/pull/41808 * Fix lr_scheduler_parsing by @SunMarc in https://github.com/huggingface/transformers/pull/41322 Full Changelog: https://github.com/huggingface/transformers/compare/v4.57.4...v4.57.5	Low	1/13/2026
v4.57.4	## What's Changed Last patch release for v4: We have a few small fixes for remote generation methods (e.g. group beam search), vLLM, and an offline tokenizer fix (if it's already been cached). * Grouped beam search from config params by @zucchini-nlp in https://github.com/huggingface/transformers/pull/42472 * Handle decorator with optional arguments better @hmellor in https://github.com/huggingface/transformers/pull/42512 * fix: make mistral base check conditional to fix offline loading by	Low	1/13/2026
v5.0.0rc2	## What's Changed This release candidate is focused on fixing `AutoTokenizer`, expanding the dynamic weight loading support, and improving performances with MoEs! ## MoEs and performances: <img width="2048" height="1451" alt="image" src="https://github.com/user-attachments/assets/3ed2508e-3eb1-4f13-8717-cd9027d12a39" /> * batched and grouped experts implementations by @IlyasMoutawwakil in https://github.com/huggingface/transformers/pull/42697 * Optimize MoEs for decoding using batched	Low	1/8/2026
v5.0.0rc1	## What's Changed This release candidate was focused mostly on `quantization` support with the new dynamic weight loader, and a few notable 🚨 breaking changes🚨: 1. Default dtype for any model when using `from_pretrained` is now `auto`! * Default auto 🚨 🚨 by @ArthurZucker in https://github.com/huggingface/transformers/pull/42805 2. Default shard size when saving a model is now 50GB: * 🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization by @Cyrilvallez in https:/	Low	1/8/2026
v5.0.0rc0	## Transformers v5 release notes <img width="1800" height="1013" alt="image" src="https://github.com/user-attachments/assets/7b5187d7-6945-4108-a546-6d1d7bfb55e3" /> - Highlights - Significant API changes: dynamic weight loading, tokenization - Backwards Incompatible Changes - Bugfixes and improvements ## Highlights We are excited to announce the initial release of Transformers v5. This is the first major release in five years, and the release is significant: 800 commits have been	Low	12/1/2025
v4.57.3	There was a hidden bug when loading models with `local_files_only=True` and a typo related to the recent patch. The main fix is: https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a. We are really sorry that this slipped through, our CIs just did not catch it. As it affects a lot of users we are gonna yank the previous release	Low	11/25/2025
v4.57.2	This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits: - Add AutoTokenizer mapping for mistral3 and ministral (#42198) - Auto convert tekken.json (#42299) - fix tekken pattern matching (#42363) - Check model inputs - hidden states (#40994) - Remove invalid `@staticmethod` from module-level get_device_and_memory_breakdown (#41747)	Low	11/24/2025
v4.57.1	This patch most notably fixes an issue with an optional dependency (`optax`), which resulted in parsing errors with `poetry`. It contains the following fixes: - [fix optax dep issue](https://github.com/huggingface/transformers/commit/0645c9ec3188e000aecf5060e2cdabcc156bb794) - [remove offload_state_dict from kwargs](https://github.com/huggingface/transformers/commit/a92b1e8a45e1863b95c5e2caa12f5597aee80279) - Fix bnb fsdp loading for pre-quantized checkpoint (#41415) - Fix tests fsdp (#414	Low	10/14/2025
v4.57.0	## New model additions ### Qwen3 Next <img width="1200" height="511" alt="image" src="https://github.com/user-attachments/assets/3abad6c4-5650-412d-a831-f8a30a5d962e" /> The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency. The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost: - Hybrid Attention: Replac	Low	10/3/2025
v4.56.2	- Processor load with multi-processing (#40786) - [Jetmoe] Fix RoPE (#40819) - Fix getter regression (#40824) - Fix config dtype parsing for Emu3 edge case (#40766)	Low	9/17/2025
v4.56.1-Vault-Gemma-preview	A new model is added to transformers: Vault-Gemma It is added on top of the v4.56.1 release, and can be installed from the following tag: `v4.56.1-Vault-Gemma-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.56.1-Vault-Gemma-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag im	Low	9/12/2025
v4.56.1	# Patch release v4.56.1 This patch most notably fixes an issue with the new `dtype` argument (replacing `torch_dtype`) in pipelines! ## Bug Fixes & Improvements - Fix broken Llama4 accuracy in MoE part (#40609) - fix pipeline dtype (#40638) - Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#40667) - Fix backward compatibility with accelerate in Trainer (#40668) - fix broken offline mode when loading tokenizer from hub (#40669) - [Glm4.5V] fix vLLM support (#40696)	Low	9/4/2025
v4.56.0-Embedding-Gemma-preview	A new model is added to transformers: Embedding Gemma It is added on top of the v4.56.0 release, and can be installed from the following tag: `v4.56.0-Embedding-Gemma-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. A	Low	9/4/2025
v4.56.0	## New model additions ### Dino v3 DINOv3 is a family of versatile vision foundation models that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models. You can find all the original DINOv3 checkpoints under the [DINOv3](https://huggingface.co/collections/face	Low	8/29/2025
v4.55.4	# Patch v4.55.4 There was a mick mack on our side when cherry-picking the commit #40197 which led to a wrong commit in the patch! Sorry everyone 😭 This patch is just the official fix for #40197!	Low	8/22/2025
v4.55.3	# Patch release 4.55.3 Focused on stabilizing FlashAttention-2 on Ascend NPU, improving FSDP behavior for generic-task models, fixing MXFP4 integration for GPT-OSS ## Bug Fixes & Improvements - FlashAttention-2 / Ascend NPU – Fix “unavailable” runtime error (#40151) by @FightingZhen - FlashAttention kwargs – Revert FA kwargs preparation to resolve regression (#40161) by @Cyrilvallez - FSDP (generic-task models) – Fix sharding/runtime issues (#40191) by @Cyrilvallez - GPT-OSS / MXFP4 –	Low	8/21/2025
v4.55.2	# Patch release 4.55.2! ## only affects `FA2` generations! 😢 Well sorry everyone, sometimes shit can happen... 4.55.1 was broken because of 🥁 git merge conflict. I cherry-picked https://github.com/huggingface/transformers/pull/40002 without having https://github.com/huggingface/transformers/pull/40029 , thus `from ..modeling_flash_attention_utils import prepare_fa_kwargs_from_position_ids` is missing, and since this is a slow test, nothing caught it. Will work to remediate and write	Low	8/13/2025
v4.55.1	# Patch release 4.55.1: Mostly focused around stabalizing the Mxfp4 for GPTOSS model! ## Bug Fixes & Improvements - Idefics2, Idefics3, SmolVLM – Fix tensor device issue (#39975) by @qgallouedec - Merge conflicts – Fix merge conflicts from previous changes by @vasqu - MXFP4 / CPU device_map – Default to dequantize when CPU is in device_map (#39993) by @MekkCyber - GPT Big Code – Fix attention scaling (#40041) by @vasqu - Windows compatibility – Resolve Triton version check compatibil	Low	8/13/2025
4.55.0-GLM-4.5V-preview	# GLM-4.5V preview based on 4.55.0 New model added by the Z.ai team to `transformers`! [GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V) is a new multimodal reasoning model based on GLM-4.5-Air, which has 106B total and 12B active parameters. It's performant across 42 benchmarks across various categories: - Image reasoning (scene understanding, complex multi-image analysis, spatial recognition) - Video understanding (long video segmentation and event recognition) - GUI tasks (scree	Low	8/11/2025
v4.55.0	## Welcome GPT OSS, the new open-source model family from OpenAI! <img width="2320" height="1160" alt="image" src="https://github.com/user-attachments/assets/4a1cd2f6-dde9-445e-83d9-73f6551e2da2" /> For more detailed information about this model, we recommend reading the following blogpost: https://huggingface.co/blog/welcome-openai-gpt-oss GPT OSS is a hugely anticipated open-weights release by OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases. I	Low	8/5/2025
4.54.1	# Patch release 4.54.1 We had quite a lot of bugs that got through! Release was a bit rushed, sorry everyone! 🤗 Mostly cache fixes, as we now have layered cache, and fixed to distributed. - Fix Cache.max_cache_len max value for Hybrid models, @manueldeprada, @Cyrilvallez, #39737 - [modenbert] fix regression, @zucchini-nlp, #39750 - Fix version issue in modeling_utils.py, @Cyrilvallez, #39759 - Fix GPT2 with cross attention, @zucchini-nlp, #39754 - Fix mamba regression, @manueldepra	Low	7/29/2025
v4.54.0	## Important news! In order to become the source of truth, we recognize that we need to address two common and long-heard critiques about `transformers`: 1. `transformers` is bloated 2. `transformers` is slow Our team has focused on improving both aspects, and we are now ready to announce this. The modeling files for the standard `Llama` models are down to 500 LOC and should be much more readable, keeping just the core of the modeling and hiding the "powerful transformers features."	Low	7/25/2025
v4.53.2-Ernie-4.5-preview	Two new models are added to transformers: Ernie 4.5, and its MoE variant, Ernie 4.5 MoE. They are added on top of the v4.53.2 release, and can be installed from the following tag: `v4.53.2-Ernie-4.5-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.53.2-Ernie-4.5-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as	Low	7/23/2025
v4.53.3	# Small path release 4.53.3! A small patch for open telemetry fixes! Sorry for the delay! ** refactor: remove set_tracer_provider and set_meter_provider calls (https://github.com/huggingface/transformers/pull/39422) from @McPatate	Low	7/22/2025
v4.53.2-modernbert-decoder-preview	A new model is added to transformers: ModernBERT Decoder It is added on top of the v4.53.2 release, and can be installed from the following tag: `v4.53.2-modernbert-decoder-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.53.2-modernbert-decoder-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improv	Low	7/16/2025
v4.53.2	This patch contains the following bug fixes: - Fix some bug for finetune and batch infer For GLM-4.1V (#39090) - [bugfix] fix flash attention 2 unavailable error on Ascend NPU (#39166) - Fix errors when use verl to train GLM4.1v model (#39199) - [pagged-attention] fix off-by-1 error in pagged attention generation (#39258) - [smollm3] add tokenizer mapping for `smollm3` (#39271) - [sliding window] revert and deprecate (#39301) - fix Glm4v batch videos forward (#39172) - Add a default va	Low	7/11/2025
v4.53.1	This patch contains several bug fixes. The following commits are included: - Fix: unprotected import of tp plugin (#39083) - Fix key mapping for VLMs (#39029) - Several fixes for Gemma3n(#39135) - [qwen2-vl] fix FA2 inference (#39121) - [smolvlm] fix video inference (#39147) - Fix multimodal processor get duplicate arguments when receive kwargs for initialization (#39125) - when delaying optimizer creation only prepare the model (#39152) - Add packed tensor format support for flex/sdpa	Low	7/4/2025
v4.53.0	## Release v4.53.0 ### Gemma3n Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages. Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operat	Low	6/26/2025
v4.52.4-Kyutai-STT-preview	A new model is added to transformers: Kyutai-STT It is added on top of the v4.52.4 release, and can be installed from the following tag: `v4.52.4-Kyutai-STT-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.52.4-Kyutai-STT-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag impli	Low	6/24/2025
v4.52.4-VJEPA-2-preview	A new model is added to transformers: V-JEPA 2 It is added on top of the v4.52.4 release, and can be installed from the following tag: `v4.52.4-VJEPA-2-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.52.4-VJEPA-2-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag implies, this	Low	6/11/2025
v4.52.4-ColQwen2-preview	A new model is added to transformers: ColQwen2 It is added on top of the v4.52.4 release, and can be installed from the following tag: `v4.52.4-ColQwen2-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.52.4-ColQwen2-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag implies, th	Low	6/2/2025
v4.52.4	The following commits are included in that patch release: - [qwen-vl] Look for vocab size in text config (#38372) - Fix convert to original state dict for VLMs (#38385) - [video utils] group and reorder by number of frames (#38374) - [paligemma] fix processor with suffix (#38365) - Protect get_default_device for torch<2.3 (#38376) - [OPT] Fix attention scaling (#38290)	Low	5/30/2025
v4.52.3	# Patch release v4.52.3 We had to protect the imports again, a series of bad events. Here are the two prs for the patch: - Fix tp error when torch distributed is already initialized (#38294) by @SunMarc - Protect ParallelInterface (#38262) by @ArthurZucker and @LysandreJik	Low	5/22/2025
v4.52.2	# Patch release v4.52.2 We had to revert #37877 because of a missing flag that was overriding the device map. We re-introduced the changes because they allow native 3D parallel training in Transformers. Sorry everyone for the troubles! 🤗 * Clearer error on import failure (#38257) by @LysandreJik * Verified tp plan should not be NONE (#38255) by @NouamaneTazi and @ArthurZucker	Low	5/21/2025
v4.52.1	## New models ### Qwen2.5-Omni <img width="1090" alt="image" src="https://github.com/user-attachments/assets/77f0fe5b-59cd-4fb6-b222-bcc2b35d6406" /> The [Qwen2.5-Omni](https://qwenlm.github.io/blog/) model is a unified multiple modalities model proposed in [Qwen2.5-Omni Technical Report](https://huggingface.co/papers/2503.20215) from Qwen team, Alibaba Group. The abstract from the technical report is the following: > We present Qwen2.5-Omni, an end-to-end multimodal model designe	Low	5/20/2025
v4.51.3-GraniteMoeHybrid-preview	A new model is added to transformers: GraniteMoeHybrid It is added on top of the v4.51.3 release, and can be installed from the following tag: `v4.51.3-GraniteMoeHybrid-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.51.3-GraniteMoeHybrid-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.	Low	5/8/2025
v4.51.3-D-FINE-preview	A new model is added to transformers: D-FINE It is added on top of the v4.51.3 release, and can be installed from the following tag: `v4.51.3-D-FINE-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.51.3-D-FINE-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag implies, this	Low	5/8/2025
v4.51.3-SAM-HQ-preview	A new model is added to transformers: SAM-HQ It is added on top of the v4.51.3 release, and can be installed from the following tag: `v4.51.3-SAM-HQ-preview`. In order to install this version, please install with the following command: ``` pip install git+https://github.com/huggingface/transformers@v4.51.3-SAM-HQ-preview ``` If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving. As the tag implies, this tag	Low	5/8/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

modular-image-classification-frameworkA modular deep learning framework for training and evaluating image classification models on datasets like CIFAR-10 and MNIST. Supports configurable CNN architectures, automated training, and performamain@2026-05-28

sentence-transformersEmbeddings, Retrieval, and Rerankingv5.5.1

llama-indexInterface between LLMs and your datav0.14.22

onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training acceleratorv1.26.0

LettuceDetectLightweight hallucination detection framework for RAG applications0.1.8

More in Frameworks

langchainThe agent engineering platform

deer-flowAn open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of ta

tqdmFast, Extensible Progress Meter

simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.