# timm

> PyTorch Image Models

- **URL**: https://www.freshcrate.ai/projects/timm
- **Author**: pypi
- **Category**: RAG & Memory
- **Latest version**: `v1.0.27` (2026-05-08)
- **License**: Apache-2.0
- **Source**: https://github.com/huggingface/pytorch-image-models
- **Homepage**: https://pypi.org/project/timm/
- **Language**: Python
- **GitHub**: 36,678 stars, 5,145 forks
- **Registry**: pypi (`timm`)
- **Tags**: `image-classification`, `pypi`, `pytorch`

## Description

# PyTorch Image Models
- [What's New](#whats-new)
- [Introduction](#introduction)
- [Models](#models)
- [Features](#features)
- [Results](#results)
- [Getting Started (Documentation)](#getting-started-documentation)
- [Train, Validation, Inference Scripts](#train-validation-inference-scripts)
- [Awesome PyTorch Resources](#awesome-pytorch-resources)
- [Licenses](#licenses)
- [Citing](#citing)

## What's New

## March 23, 2026
* Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse.
* Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks.
* Fix class & register token uses with ViT and no pos embed enabled.
* Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
* Improve consistency of output projection / MLP dimensions for attention pooling layers.
* Hiera model F.SDPA optimization to allow Flash Attention kernel use.
* Caution added to SGDP optimizer.
* Release 1.0.26. First maintenance release since my departure from Hugging Face.

## Feb 23, 2026
* Add token distillation training support to distillation task wrappers
* Remove some torch.jit usage in prep for official deprecation
* Caution added to AdamP optimizer
* Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
* Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
* Release 1.0.25

## Jan 21, 2026
* **Compat Break**: Fix oversight w/ QKV vs MLP bias in `ParallelScalingBlock` (& `DiffParallelScalingBlock`)
  * Does not impact any trained `timm` models but could impact downstream use.

## Jan 5 & 6, 2026
* Release 1.0.24
* Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
* Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
* Release 1.0.23

## Dec 30, 2025
* Add better NAdaMuon trained `dpwee`, `dwee`, `dlittle` (differential) ViTs with a small boost over previous runs
  * https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
  * https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k  (81.80% top-1)
  * https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
* Add a ~21M param `timm` variant of the CSATv2 model at 512x512 & 640x640
  * https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
  * https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
* Factor non-persistent param init out of `__init__` into a common method that can be externally called via `init_non_persistent_buffers()` after meta-device init. 
  
## Dec 12, 2025
* Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
* Add AdaMuon and NAdaMuon optimizer support to existing `timm` Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
* End of year PR cleanup, merge aspects of several long open PR
  * Merge differential attention (`DiffAttention`), add corresponding `DiffParallelScalingBlock` (for ViT), train some wee vits
    * https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
    * https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
  * Add a few pooling modules, `LsePlus` and `SimPool`
  * Cleanup, optimize `DropBlock2d` (also add support to ByobNet based models)
* Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10
  
## Dec 1, 2025
* Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
* Remove old APEX AMP support

## Nov 4, 2025
* Fix LayerScale / LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https://github.com/Ilya-Fradlin
* Release 1.0.22

## Oct 31, 2025 🎃
* Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions
* EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)

## Oct 16-20, 2025
* Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
  * extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
  * small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
  * by default uses AdamW (or NAdamW if `nesterov=True`) updates if muon not suitable for parameter shape (or excluded via param group flag)
  * like torch impl, select from several LR scale adjustment fns via `adjust_lr_fn`
  * select from several NS coefficient presets or specify your own vi

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v1.0.27` | 2026-05-08 | High | ## April 23, 2026 * Add Gemma4 ViT encoders w/ NaFlex pipeline support (variable aspect/size per image). Thanks [Yonghye Kwon](https://github.com/developer0hye) * Support DINOv3 weights in NaFlexVit. Thanks [Yonghye Kwon](https://github.com/developer0hye) * Some improvements to Muon fallback (AdamW/NadamW) lr behavior  ## What's Changed * 🔒 Pin GitHub Actions to commit SHAs by @paulinebm in https://github.com/huggingface/pytorch-image-models/pull/2689 * Improve fallback (adamw/nadamw) LR |
| `1.0.26` | 2026-04-21 | Low | Imported from PyPI (1.0.26) |
| `v1.0.26` | 2026-03-23 | Medium | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |
| `v1.0.26` | 2026-03-23 | Low | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / |

## Citation

- HTML: https://www.freshcrate.ai/projects/timm
- Markdown: https://www.freshcrate.ai/projects/timm.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/timm/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._