# transformers > Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. - **URL**: https://www.freshcrate.ai/projects/transformers - **Author**: The Hugging Face team with the help of all our contributors (https://github.com/hu - **Category**: Frameworks - **Latest version**: `v5.10.1` (2026-06-03) - **License**: Apache 2.0 License - **Source**: https://github.com/huggingface/transformers - **Language**: Python - **GitHub**: 159,705 stars, 32,961 forks - **Registry**: pypi (`transformers`) - **Tags**: `deep-learning`, `llm`, `machine-learning`, `nlp`, `pypi`, `python`, `pytorch`, `transformer`, `vlm` ## Description

English | 简体中文 | 繁體中文 | 한국어 | Español | 日本語 | हिन्दी | Русский | Português | తెలుగు | Français | Deutsch | Italiano | Tiếng Việt | العربية | اردو | বাংলা |

State-of-the-art pretrained models for inference and training

Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer vision, audio, video, and multimodal models, for both inference and training. It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM ## Recent releases | Version | Date | Urgency | Changes | | --- | --- | --- | --- | | `v5.10.1` | 2026-06-03 | High | # Release v5.10.1 v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!! ## New Model additions ### Gemma4 unified+ Gemma4 MTP

Gemma 4 12B Unified is an **encoder-free** multimodal model with pretrained and instruction-tuned variants. Unlike [standard Gemma 4](./gemma4), which uses dedicated encoder | | `v5.9.0` | 2026-05-20 | High | # Release v5.9.0 ## New Model additions ### Cohere2Moe Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences. **Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/cohere2_moe) * Add new cohere2_moe model (#4611 | | `v5.8.1` | 2026-05-13 | High | # Patch release v5.8.1 This release is mainly to fix the Deepseek V4 integration!!!

* [fix] Add fatal_error to ContinuousBatchingManager so the serving... by @qgallouedec, @remi-or * Fix WeightConverter regex incorrectly matching shared_experts as experts by @silencelamb, @claude * Fix deepseek v4 by @ArthurZucker (#45892) * Deepseek v4 csa mask collaps | | `v5.8.0` | 2026-05-05 | High | # Release v5.8.0 ## New Model additions ### DeepSeek-V4

DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections fo | | `v5.7.0` | 2026-04-28 | High | # Release v5.7.0 ## New Model additions ### Laguna

Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing different decoder layers to have different query-head counts while sharing the same KV cache shape, and implements a sigmoid Mo | | `v5.6.2` | 2026-04-23 | High | # Patch release v5.6.2 Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this :saluting_face: * Fix configuration reading and error handling for kernels (https://github.com/huggingface/transformers/pull/45610) by @hmellor **Full Changelog**: https://github.com/huggingface/transformers/compare/v5.6.1...v5.6.2 | | `v5.6.0` | 2026-04-22 | High | # Release v5.6.0 ## New Model additions ### OpenAI Privacy Filter OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi pr | | `5.5.4` | 2026-04-21 | Low | Imported from PyPI (5.5.4) | | `v5.5.4` | 2026-04-13 | Medium | # Patch release v5.5.4 This is mostly some fixes that are good to have asap, mostly for tokenizers; ** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker For training: ** Fix #45305 + add regression test GAS (#45349) by florian6973, SunMarc ** Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#…) by ArthurZucker And for Qwen2.5-VL : ** Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchi | | `v5.5.3` | 2026-04-09 | Medium | Small patch release to fix `device_map` support for Gemma4! It contains the following commit: - [gemma4] Fix device map auto (#45347) by @Cyrilvallez | ## Citation - HTML: https://www.freshcrate.ai/projects/transformers - Markdown: https://www.freshcrate.ai/projects/transformers.md - Dependencies JSON: https://www.freshcrate.ai/api/projects/transformers/deps _Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._