# torchao

> Package for applying ao techniques to GPU models

- **URL**: https://www.freshcrate.ai/projects/torchao
- **Author**: pypi
- **Category**: RAG & Memory
- **Latest version**: `0.17.0` (2026-04-21)
- **License**: Unknown
- **Source**: https://github.com/pytorch/ao
- **Language**: Python
- **GitHub**: 2,790 stars, 493 forks
- **Registry**: pypi (`torchao`)
- **Tags**: `pypi`

## Description

<div align="center">

# TorchAO

</div>

### PyTorch-Native Training-to-Serving Model Optimization
- Pre-train Llama-3.1-70B **1.5x faster** with float8 training
- Recover **67% of quantized accuracy degradation** on Gemma3-4B with QAT
- Quantize Llama-3-8B to int4 for **1.89x faster** inference with **58% less memory**

<div align="center">

[![](https://img.shields.io/badge/CodeML_%40_ICML-2025-blue)](https://openreview.net/attachment?id=HpqH0JakHf&name=pdf)
[![](https://dcbadge.vercel.app/api/server/gpumode?style=flat&label=TorchAO%20in%20GPU%20Mode)](https://discord.com/channels/1189498204333543425/1205223658021458100)
[![](https://img.shields.io/github/contributors-anon/pytorch/ao?color=yellow&style=flat-square)](https://github.com/pytorch/ao/graphs/contributors)
[![](https://img.shields.io/badge/torchao-documentation-blue?color=DE3412)](https://docs.pytorch.org/ao/stable/index.html)
[![license](https://img.shields.io/badge/license-BSD_3--Clause-lightgrey.svg)](./LICENSE)

[Latest News](#-latest-news) | [Overview](#-overview) | [Quick Start](#-quick-start)  | [Installation](#-installation) | [Integrations](#-integrations) | [Inference](#-inference) | [Training](#-training) | [Videos](#-videos) | [Citation](#-citation)

</div>


## 📣 Latest News

- [Oct 25] QAT is now integrated into [Unsloth](https://docs.unsloth.ai/new/quantization-aware-training-qat) for both full and LoRA fine-tuning! Try it out using [this notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_%284B%29_Instruct-QAT.ipynb).
- [Oct 25] MXFP8 MoE training prototype achieved **~1.45x speedup** for MoE layer in Llama4 Scout, and **~1.25x** speedup for MoE layer in DeepSeekV3 671b - with comparable numerics to bfloat16! Check out the [docs](./torchao/prototype/moe_training/) to try it out.
- [Sept 25] MXFP8 training achieved [1.28x speedup on Crusoe B200 cluster](https://pytorch.org/blog/accelerating-2k-scale-pre-training-up-to-1-28x-with-torchao-mxfp8-and-torchtitan-on-crusoe-b200-cluster/) with virtually identical loss curve to bfloat16!
- [Sept 19] [TorchAO Quantized Model and Quantization Recipes Now Available on Huggingface Hub](https://pytorch.org/blog/torchao-quantized-models-and-quantization-recipes-now-available-on-huggingface-hub/)!
- [Jun 25] Our [TorchAO paper](https://openreview.net/attachment?id=HpqH0JakHf&name=pdf) was accepted to CodeML @ ICML 2025!


<details>
  <summary>Older news</summary>

- [May 25] QAT is now integrated into [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) for fine-tuning ([docs](https://docs.axolotl.ai/docs/qat.html))!
- [Apr 25] Float8 rowwise training yielded [1.34-1.43x training speedup](https://pytorch.org/blog/accelerating-large-scale-training-and-convergence-with-pytorch-float8-rowwise-on-crusoe-2k-h200s/) at 2k H100 GPU scale
- [Apr 25] TorchAO is added as a [quantization backend to vLLM](https://docs.vllm.ai/en/latest/features/quantization/torchao.html) ([docs](https://docs.vllm.ai/en/latest/features/quantization/torchao.html))!
- [Mar 25] Our [2:4 Sparsity paper](https://openreview.net/pdf?id=O5feVk7p6Y) was accepted to SLLM @ ICLR 2025!
- [Jan 25] Our [integration with GemLite and SGLang](https://pytorch.org/blog/accelerating-llm-inference/) yielded 1.1-2x faster inference with int4 and float8 quantization across different batch sizes and tensor parallel sizes
- [Jan 25] We added [1-8 bit ARM CPU kernels](https://pytorch.org/blog/hi-po-low-bit-operators/) for linear and embedding ops
- [Nov 24] We achieved [1.43-1.51x faster pre-training](https://pytorch.org/blog/training-using-float8-fsdp2/) on Llama-3.1-70B and 405B using float8 training
- [Oct 24] TorchAO is added as a quantization backend to HF Transformers!
- [Sep 24] We officially launched TorchAO. Check out our blog [here](https://pytorch.org/blog/pytorch-native-architecture-optimization/)!
- [Jul 24] QAT [recovered up to 96% accuracy degradation](https://pytorch.org/blog/quantization-aware-training/) from quantization on Llama-3-8B
- [Jun 24] Semi-structured 2:4 sparsity [achieved 1.1x inference speedup and 1.3x training speedup](https://pytorch.org/blog/accelerating-neural-network-training/) on the SAM and ViT models respectively
- [Jun 24] Block sparsity [achieved 1.46x training speeedup](https://pytorch.org/blog/speeding-up-vits/) on the ViT model with <2% drop in accuracy

</details>


## 🌅 Overview

TorchAO is an easy to use quantization library for native PyTorch. TorchAO works out-of-the-box with `torch.compile()` and `FSDP2` across most HuggingFace PyTorch models.

For a detailed overview of stable and prototype workflows for different hardware and dtypes, see the [Workflows documentation](https://docs.pytorch.org/ao/main/workflows.html).

Check out our [docs](https://docs.pytorch.org/ao/main/) for more details!

## 🚀 Quick Start

First, install TorchAO. We recommend installing the latest stable version:
```bash
pip install torchao
```

Quantize your model weights to int4!
```pyt

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `0.17.0` | 2026-04-21 | Low | Imported from PyPI (0.17.0) |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Medium | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Low | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Low | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |
| `v0.17.0` | 2026-03-30 | Low | ## Highlights  We are excited to announce the 0.17 release of torchao\! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more\!  ### CuteDSL MXFP8 MoE Kernels  We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: [https://github.com/pytorch/ao/pull/4090](https://github.com/pytorch/ao/pull/4090)  * Used for scaling along dim1 in |

## Citation

- HTML: https://www.freshcrate.ai/projects/torchao
- Markdown: https://www.freshcrate.ai/projects/torchao.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/torchao/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._
