Description
# PyTorch Image Models - [What's New](#whats-new) - [Introduction](#introduction) - [Models](#models) - [Features](#features) - [Results](#results) - [Getting Started (Documentation)](#getting-started-documentation) - [Train, Validation, Inference Scripts](#train-validation-inference-scripts) - [Awesome PyTorch Resources](#awesome-pytorch-resources) - [Licenses](#licenses) - [Citing](#citing) ## What's New ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / MLP dimensions for attention pooling layers. * Hiera model F.SDPA optimization to allow Flash Attention kernel use. * Caution added to SGDP optimizer. * Release 1.0.26. First maintenance release since my departure from Hugging Face. ## Feb 23, 2026 * Add token distillation training support to distillation task wrappers * Remove some torch.jit usage in prep for official deprecation * Caution added to AdamP optimizer * Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights * Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor) * Release 1.0.25 ## Jan 21, 2026 * **Compat Break**: Fix oversight w/ QKV vs MLP bias in `ParallelScalingBlock` (& `DiffParallelScalingBlock`) * Does not impact any trained `timm` models but could impact downstream use. ## Jan 5 & 6, 2026 * Release 1.0.24 * Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1 * Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports * Release 1.0.23 ## Dec 30, 2025 * Add better NAdaMuon trained `dpwee`, `dwee`, `dlittle` (differential) ViTs with a small boost over previous runs * https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1) * https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1) * https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1) * Add a ~21M param `timm` variant of the CSATv2 model at 512x512 & 640x640 * https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1) * https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1) * Factor non-persistent param init out of `__init__` into a common method that can be externally called via `init_non_persistent_buffers()` after meta-device init. ## Dec 12, 2025 * Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2 * Add AdaMuon and NAdaMuon optimizer support to existing `timm` Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks. * End of year PR cleanup, merge aspects of several long open PR * Merge differential attention (`DiffAttention`), add corresponding `DiffParallelScalingBlock` (for ViT), train some wee vits * https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k * https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k * Add a few pooling modules, `LsePlus` and `SimPool` * Cleanup, optimize `DropBlock2d` (also add support to ByobNet based models) * Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10 ## Dec 1, 2025 * Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks. * Remove old APEX AMP support ## Nov 4, 2025 * Fix LayerScale / LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https://github.com/Ilya-Fradlin * Release 1.0.22 ## Oct 31, 2025 🎃 * Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions * EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs) ## Oct 16-20, 2025 * Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations * extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization * small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops * by default uses AdamW (or NAdamW if `nesterov=True`) updates if muon not suitable for parameter shape (or excluded via param group flag) * like torch impl, select from several LR scale adjustment fns via `adjust_lr_fn` * select from several NS coefficient presets or specify your own vi
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 1.0.26 | Imported from PyPI (1.0.26) | Low | 4/21/2026 |
| v1.0.26 | ## March 23, 2026 * Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse. * Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks. * Fix class & register token uses with ViT and no pos embed enabled. * Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr). * Improve consistency of output projection / | Medium | 3/23/2026 |
| v1.0.25 | ## Feb 23, 2026 * Add token distillation training support to distillation task wrappers * Remove some torch.jit usage in prep for official deprecation * Caution added to AdamP optimizer * Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights * Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor) * Release 1.0.25 ## Jan 21, 2026 * **Compat Break**: Fix oversight w/ QKV vs MLP | Low | 2/23/2026 |
| v1.0.24 | ## Jan 5 & 6, 2025 * Patch Release 1.0.24 (fix for 1.0.23) * Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1 * Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports * Release 1.0.23 ## Dec 30, 2025 * Add better NAdaMuon trained `dpwee`, `dwee`, `dlittle` (differential) ViTs with a small boost over previous runs * https://huggingface.co/timm/vit_dlittle_patch16_reg1_ga | Low | 1/7/2026 |
| v1.0.23 | ## Dec 30, 2025 * Add better NAdaMuon trained `dpwee`, `dwee`, `dlittle` (differential) ViTs with a small boost over previous runs * https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1) * https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1) * https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1) * Add a ~21M param `timm` variant of the CSATv2 model at 512x512 & 640x640 | Low | 1/5/2026 |
| v1.0.22 | Patch release for priority LayerScale initialization regression in 1.0.21 ## What's Changed * Add some weights for efficientnet_x / efficientnet_h models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2602 * Update result csvs by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2603 * Fix LayerScale ignoring init_values by @Ilya-Fradlin in https://github.com/huggingface/pytorch-image-models/pull/2605 ## New Contributors * @Ilya-Fradlin m | Low | 11/5/2025 |
| v1.0.21 | ## Oct 16-20, 2025 * Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations * extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization * small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops * by default uses AdamW (or NAdamW if `nesterov=True`) updates if muon not suitable for parameter shape (or excluded via param group flag) * like torch imp | Low | 10/24/2025 |
| v1.0.20 | ## Sept 21, 2025 * Remap DINOv3 ViT weight tags from `lvd_1689m` -> `lvd1689m` to match (same for `sat_493m` -> `sat493m`) * Release 1.0.20 ## Sept 17, 2025 * DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing `timm` model. ViT support done via the EVA base model w/ a new `RotaryEmbeddingDinoV3` to match the DINOv3 specific RoPE impl * HuggingFace Hub: https://huggingface.co/collections/timm/timm-dinov3-68cb08bb0bee365973d52a | Low | 9/21/2025 |
| v1.0.19 | Patch release for Python 3.9 compat break in 1.0.18 ## July 23, 2025 * Add `set_input_size()` method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. * Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0 * Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release. ## July 21, 2025 * ROPE support added to NaFlexViT. All models covered by the EVA base (`eva.py`) including EVA, EVA02, Meta PE ViT, `timm` SBB ViT w/ ROPE, | Low | 7/24/2025 |
| v1.0.18 | ## July 23, 2025 * Add `set_input_size()` method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. * Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0 ## July 21, 2025 * ROPE support added to NaFlexViT. All models covered by the EVA base (`eva.py`) including EVA, EVA02, Meta PE ViT, `timm` SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when `use_naflex=True` passed at model creation time * More Meta PE ViT encoders a | Low | 7/23/2025 |
| v1.0.17 | ## July 7, 2025 * MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights) * Add stem bias (zero'd in updated weights, compat break with old weights) * GELU -> GELU (tanh approx). A minor change to be closer to JAX * Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold * Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32 * Some typi | Low | 7/10/2025 |
| v1.0.16 | ## June 26, 2025 * MobileNetV5 backbone (w/ encoder only variant) for [Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n#parameters) image encoder * Version 1.0.16 released ## June 23, 2025 * Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl). * Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl). * Add 3 initial nat | Low | 6/26/2025 |
| v1.0.15 | ## Feb 21, 2025 * SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9) * Variable resolution / aspect NaFlex versions are a WIP * Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training. * `vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k` - 88.1% top-1 * `vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k` - 87.9% top-1 * `vit_so150m2_patch16_r | Low | 2/23/2025 |
| v1.0.14 | ## Jan 19, 2025 * Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated * Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1 * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1 * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k` * Misc typing, typo, etc. cleanup * 1.0.14 release | Low | 1/19/2025 |
| v1.0.13 | ## Jan 9, 2025 * Add support to train and validate in pure `bfloat16` or `float16` * `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name * Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts) * 1.0.13 release ## Jan 6, 2025 * Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env. ## Dec 31, 2024 * | Low | 1/9/2025 |
| v1.0.12 | ## Nov 28, 2024 * More optimizers * Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS) * Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer) * Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW * Cleanup some docstrings and type ann | Low | 12/3/2024 |
| v1.0.11 | Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested. ## Oct 16, 2024 * Fix error on importing from deprecated path `timm.models.registry`, increased priority of existing deprecation warnings to be visible * Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to `timm` as `vit_intern300m_patch14_448` ### Oct 14, 2024 * Pre-activation (ResNetV2) version of 18/18d/34/34d R | Low | 10/16/2024 |
| v1.0.10 | ### Oct 14, 2024 * Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending) * Release 1.0.10 ### Oct 11, 2024 * MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights. |model |im | Low | 10/15/2024 |
| v1.0.9 | ### Aug 21, 2024 * Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models | model | top1 | top5 | param_count | img_size | | -------------------------------------------------- | ------ | ------ | ----------- | -------- | | [vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 87.438 | 98.256 | 64.11 | 384 | | [vit_medi | Low | 8/23/2024 |
| v1.0.8 | ### July 28, 2024 * Add `mobilenet_edgetpu_v2_m` weights w/ `ra4` mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256. * Release 1.0.8 ### July 26, 2024 * More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models | model |top1 |top1_err|top5 |top5_err|param_count|img_size| |---------------------------------------------------------------------------- | Low | 7/29/2024 |
| v1.0.7 | ### June 12, 2024 * MobileNetV4 models and initial set of `timm` trained weights added: | model |top1 |top1_err|top5 |top5_err|param_count|img_size| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | [mobilenetv4_hybrid_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.e600_r384_i | Low | 6/19/2024 |
| v1.0.3 | ### May 14, 2024 * Support loading PaliGemma jax weights into SigLIP ViT models with average pooling. * Add Hiera models from Meta (https://github.com/facebookresearch/hiera). * Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`) * Version 1.0.3 release ### May 11, 2024 * `Searching for Better ViT Baselines (For the GPU Poor)` weights and vit variants released. Exploring model shapes between Tiny and Base. | model | top1 | top5 | pa | Low | 5/15/2024 |
| v0.9.16 | ### Feb 19, 2024 * Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT * HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by [SeeFun](https://github.com/seefun) * Removed setup.py, moved to pyproject.toml based build supported by PDM * Add updated model EMA impl using _for_each for less overhead * Support device args in train script for non GPU devices * Other misc fixes and small additions * Min supported Python version incr | Low | 2/19/2024 |
| v0.9.12 | ### Nov 23, 2023 * Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun) * Fix Python 3.7 compat, will be dropping support for it soon * Other misc fixes * Release 0.9.12 | Low | 11/24/2023 |
| v0.9.11 | ### Nov 20, 2023 * Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation. * See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json * Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035 * Updated imagenet eval and test set csv files with latest models * `vision_transformer.py` typing and | Low | 11/20/2023 |
| v0.9.10 | ### Nov 4 * Patch fix for 0.9.9 to fix FrozenBatchnorm2d import path for old torchvision (~2 years ) ### Nov 3, 2023 * [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added * DINOv2 'register' ViT model weights added * Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) * Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a | Low | 11/4/2023 |
| v0.9.9 | ### Nov 3, 2023 * [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added * DINOv2 'register' ViT model weights added * Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) * Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w) * ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge` * 0.9.9 release | Low | 11/3/2023 |
| v0.9.8 | ### Oct 20, 2023 * [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`. * Great potential for fine-tune and downstream feature use. * Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588) * Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm) * Add patch resizing support (on pretrained weight load) to Swin models * 0.9.8 re | Low | 10/21/2023 |
| v0.9.7 | Small bug fix & extra model from [v0.9.6](https://github.com/huggingface/pytorch-image-models/releases/tag/v0.9.6) ### Sep 1, 2023 * TinyViT added by [SeeFun](https://github.com/seefun) * Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10 * 0.9.7 release | Low | 9/2/2023 |
| v0.9.6 | ### Aug 28, 2023 * Add dynamic img size support to models in `vision_transformer.py`, `vision_transformer_hybrid.py`, `deit.py`, and `eva.py` w/o breaking backward compat. * Add `dynamic_img_size=True` to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass). * Add `dynamic_img_pad=True` to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass). * Enabling either dynamic mo | Low | 8/29/2023 |
| v0.9.5 | Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (`seresnextaa201d_32x8d.sw_in12k_ft_in1k_384`) ### Aug 3, 2023 * Add GluonCV weights for HRNet w18_small and w18_small_v2. Converted by [SeeFun](https://github.com/seefun) * Fix `selecsls*` model naming regression * Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize) * v0.9.5 release prep ### July 27, 2023 * Added | Low | 8/3/2023 |
| v0.9.2 | * Fix _hub deprecation pass through import | Low | 5/14/2023 |
| v0.9.1 | The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases... ### May 12, 2023 * Fix Python 3.7 import error re Final[] typing annotation ### May 11, 2023 * `timm` 0.9 released, transition from 0.8.xdev releases ### May 10, 2023 * Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in `timm` * DINOv2 vit feature backbone weights added thanks to [Leng Yue](https://github.com/leng-yue) * FB MAE vit featur | Low | 5/12/2023 |
| v0.9.0 | First non pre-release in a loooong while, changelog from 0.6.x below... ### May 11, 2023 * `timm` 0.9 released, transition from 0.8.xdev releases ### May 10, 2023 * Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in `timm` * DINOv2 vit feature backbone weights added thanks to [Leng Yue](https://github.com/leng-yue) * FB MAE vit feature backbone weights added * OpenCLIP DataComp-XL L/14 feat backbone weights added * MetaFormer (p | Low | 5/12/2023 |
| v0.6.13 | Release from 0.6.x stable branch with fix for Python 3.11. NOTE original 0.6.13 release tag was against wrong branch. | Low | 4/16/2023 |
| v0.8.17dev0 | ### March 22, 2023 * More weights pushed to HF hub along with multi-weight support, including: `regnet.py`, `rexnet.py`, `byobnet.py`, `resnetv2.py`, `swin_transformer.py`, `swin_transformer_v2.py`, `swin_transformer_v2_cr.py` * Swin Transformer models support feature extraction (NCHW feat maps for `swinv2_cr_*`, and NHWC for all others) and spatial embedding outputs. * FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extra | Low | 3/24/2023 |
| v0.8.13dev0 | ### Feb 20, 2023 * Add 320x320 `convnext_large_mlp.clip_laion2b_ft_320` and `convnext_lage_mlp.clip_laion2b_ft_soup_320` CLIP image tower weights for features & fine-tune * 0.8.13dev0 pypi release for latest changes w/ move to huggingface org ### Feb 16, 2023 * `safetensor` checkpoint support added * Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block * Add F.scaled_dot_product_attention support (PyTorch 2. | Low | 2/20/2023 |
| v0.8.10dev0 | ### Feb 7, 2023 * New inference benchmark numbers added in [results](results/) folder. * Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes * `convnext_base.clip_laion2b_augreg_ft_in1k` - 86.2% @ 256x256 * `convnext_base.clip_laiona_augreg_ft_in1k_384` - 86.5% @ 384x384 * `convnext_large_mlp.clip_laion2b_augreg_ft_in1k` - 87.3% @ 256x256 * `convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384` - 87.9% @ 384x384 * Add DaViT models. Supports `features_only=Tr | Low | 2/7/2023 |
| v0.8.6dev0 | ### Jan 11, 2023 * Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT `.in12k` tags) * `convnext_nano.in12k_ft_in1k` - 82.3 @ 224, 82.9 @ 288 (previously released) * `convnext_tiny.in12k_ft_in1k` - 84.2 @ 224, 84.5 @ 288 * `convnext_small.in12k_ft_in1k` - 85.2 @ 224, 85.3 @ 288 ### Jan 6, 2023 * Finally got around to adding `--model-kwargs` and `--opt-kwargs` to scripts to pass through rare args directly to model classes from cmd line * `trai | Low | 1/12/2023 |
| v0.8.2dev0 | Part way through the conversion of models to multi-weight support (`model_arch.pretrain_tag`), module reorg for future building, and lots of new weights and model additions as we go... This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/t | Low | 12/24/2022 |
| v0.6.12 | Minor bug fixes to HF push_to_hub, plus some more MaxVit weights ### Oct 10, 2022 * More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments: * `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm) * `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN) * `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G) * `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) | Low | 11/23/2022 |
| v0.6.11 | ## Changes Since 0.6.7 ### Sept 23, 2022 * CLIP LAION-2B pretrained B/32, L/14, H/14, and g/14 image tower weights as vit models (for fine-tune) ### Sept 7, 2022 * Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future * Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2 * Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` vari | Low | 10/3/2022 |
| v0.1-weights-maxx | # CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` trained weights Weights were created reproducing the paper architectures and exploring timm sepcific additions such as ConvNeXt blocks, parallel partitioning, and other experiments. Weights were trained on a mix of TPU and GPU systems. Bulk of weights were trained on TPU via the TRC program (https://sites.research.google/trc/about/). CoAtNet variants run particularly well on TPU, it's a gr | Low | 8/24/2022 |
| v0.1-weights-morevit | # More weights for 3rd party ViT / ViT-CNN hybrids that needed remapping / re-hosting ## EfficientFormer Rehosted and remaped checkpoints from https://github.com/snap-research/EfficientFormer (originals in Google Drive) ## GCViT Heavily remaped from originals at https://github.com/NVlabs/GCVit due to from-scratch re-write of model code NOTE: these checkpoints have a non-commercial [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. | Low | 8/17/2022 |
| v0.6.7 | Minor bug fixes and a few more weights since 0.6.5 * A few more weights & model defs added: * `darknetaa53` - 79.8 @ 256, 80.5 @ 288 * `convnext_nano` - 80.8 @ 224, 81.5 @ 288 * `cs3sedarknet_l` - 81.2 @ 256, 81.8 @ 288 * `cs3darknet_x` - 81.8 @ 256, 82.2 @ 288 * `cs3sedarknet_x` - 82.2 @ 256, 82.7 @ 288 * `cs3edgenet_x` - 82.2 @ 256, 82.7 @ 288 * `cs3se_edgenet_x` - 82.8 @ 256, 83.5 @ 320 * `cs3*` weights above all trained on TPU w/ `bits_and_tpu` branch. Thanks to TRC | Low | 7/27/2022 |
| v0.6.5 | First official release in a long while (since 0.5.4). All change log since 0.5.4 below, ### July 8, 2022 More models, more fixes * Official research models (w/ weights) added: * EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt) * MobileViT-V2 from (https://github.com/apple/ml-cvnets) * DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit) * My own models: * Small `ResNet` defs added by request with 1 block repeats for both basic and bottleneck (resnet1 | Low | 7/10/2022 |
| v0.1-weights-swinv2 | This release holds weights for timm's variant of Swin V2 (from @ChristophReich1996 impl, https://github.com/ChristophReich1996/Swin-Transformer-V2) NOTE: `ns` variants of the models have extra norms on the main branch at the end of each stage, this seems to help training. The current `small` model is not using this, but currently training one. Will have a non-ns tiny soon as well as a comparsion. in21k and 1k base models are also in the works... `small` checkpoints trained on TPU-VM instan | Low | 4/3/2022 |
| v0.1-tpu-weights | A wide range of mid-large sized models trained in PyTorch XLA on TPU VM instances. Demonstrating viability of the TPU + PyTorch combo for excellent image model results. All models trained w/ the `bits_and_tpu` branch of this codebase. A big thanks to the TPU Research Cloud (https://sites.research.google/trc/about/) for the compute used in these experiments. This set includes several novel weights, including EvoNorm-S RegNetZ (C/D timm variants) and ResNet-V2 model experiments, as well as | Low | 3/18/2022 |
| v0.1-mvit-weights | Pretrained weights for MobileViT and MobileViT-V2 adapted from Apple impl at https://github.com/apple/ml-cvnets Checkpoints remapped to `timm` impl of the model with BGR corrected to RGB (for V1). | Low | 1/31/2022 |
| v0.5.4 | Release v0.5.4 | Low | 1/17/2022 |
| v0.1-rsb-weights | # Weights for ResNet Strikes Back Paper: https://arxiv.org/abs/2110.00476 More details on weights and hparams to come... | Low | 10/4/2021 |
