# faster-whisper

> Faster Whisper transcription with CTranslate2

- **URL**: https://www.freshcrate.ai/projects/faster-whisper
- **Author**: Guillaume Klein
- **Category**: Frameworks
- **Latest version**: `1.2.1` (2026-04-21)
- **License**: MIT
- **Source**: https://github.com/SYSTRAN/faster-whisper
- **Language**: Python
- **GitHub**: 22,327 stars, 1,813 forks
- **Registry**: pypi (`faster-whisper`)
- **Tags**: `ctranslate2`, `inference`, `openai`, `pypi`, `quantization`, `speech`, `transformer`, `whisper`

## Description

[![CI](https://github.com/SYSTRAN/faster-whisper/workflows/CI/badge.svg)](https://github.com/SYSTRAN/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper)

# Faster Whisper transcription with CTranslate2

**faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

## Benchmark

### Whisper

For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:

* [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/tree/v20240930)
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[v1.7.2](https://github.com/ggerganov/whisper.cpp/tree/v1.7.2)
* [transformers](https://github.com/huggingface/transformers)@[v4.46.3](https://github.com/huggingface/transformers/tree/v4.46.3)
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0)

### Large-v2 model on GPU

| Implementation | Precision | Beam size | Time | VRAM Usage |
| --- | --- | --- | --- | --- |
| openai/whisper | fp16 | 5 | 2m23s | 4708MB |
| whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB |
| transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB |
| faster-whisper | fp16 | 5 | 1m03s | 4525MB |
| faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB |
| faster-whisper | int8 | 5 | 59s | 2926MB |
| faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB |

### distil-whisper-large-v3 model on GPU

| Implementation | Precision | Beam size | Time | YT Commons WER |
| --- | --- | --- | --- | --- |
| transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 |
| faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 |

*GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.*
[^1]: transformers OOM for any batch size > 1

### Small model on CPU

| Implementation | Precision | Beam size | Time | RAM Usage |
| --- | --- | --- | --- | --- |
| openai/whisper | fp32 | 5 | 6m58s | 2335MB |
| whisper.cpp | fp32 | 5 | 2m05s | 1049MB |
| whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB |
| faster-whisper | fp32 | 5 | 2m37s | 2257MB |
| faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB |
| faster-whisper | int8 | 5 | 1m42s | 1477MB |
| faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB |

*Executed with 8 threads on an Intel Core i7-12700K.*


## Requirements

* Python 3.9 or greater

Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.

### GPU

GPU execution requires the following NVIDIA libraries to be installed:

* [cuBLAS for CUDA 12](https://developer.nvidia.com/cublas)
* [cuDNN 9 for CUDA 12](https://developer.nvidia.com/cudnn)

**Note**: The latest versions of `ctranslate2` only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the `3.24.0` version of `ctranslate2`, for CUDA 12 and cuDNN 8, downgrade to the `4.4.0` version of `ctranslate2`, (This can be done with `pip install --force-reinstall ctranslate2==4.4.0` or specifying the version in a `requirements.txt`).

There are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below. 

<details>
<summary>Other installation methods (click to expand)</summary>


**Note:** For all these methods below, keep in mind the above note regarding CUDA versions. Depending on your setup, you may need to install the _CUDA 11_ versions of libraries that correspond to the CUDA 12 libraries listed in the instructions below.

#### Use Docker

The libraries (cuBLAS, cuDNN) are installed in this official NVIDIA CUDA Docker images: `nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04`.

#### Install with `pip` (Linux only)

On Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python.

```bash
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*

export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
```

#### Download the libraries from Purfview's repository (Windows & Linux)

Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provid

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `1.2.1` | 2026-04-21 | Low | Imported from PyPI (1.2.1) |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |
| `v1.2.1` | 2025-10-31 | Low | ## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/p |

## Citation

- HTML: https://www.freshcrate.ai/projects/faster-whisper
- Markdown: https://www.freshcrate.ai/projects/faster-whisper.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/faster-whisper/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._
