freshcrate
Home > Frameworks > faster-whisper

faster-whisper

Faster Whisper transcription with CTranslate2

Description

[![CI](https://github.com/SYSTRAN/faster-whisper/workflows/CI/badge.svg)](https://github.com/SYSTRAN/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper) # Faster Whisper transcription with CTranslate2 **faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models. This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. ## Benchmark ### Whisper For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations: * [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/tree/v20240930) * [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[v1.7.2](https://github.com/ggerganov/whisper.cpp/tree/v1.7.2) * [transformers](https://github.com/huggingface/transformers)@[v4.46.3](https://github.com/huggingface/transformers/tree/v4.46.3) * [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0) ### Large-v2 model on GPU | Implementation | Precision | Beam size | Time | VRAM Usage | | --- | --- | --- | --- | --- | | openai/whisper | fp16 | 5 | 2m23s | 4708MB | | whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB | | transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB | | faster-whisper | fp16 | 5 | 1m03s | 4525MB | | faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB | | faster-whisper | int8 | 5 | 59s | 2926MB | | faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB | ### distil-whisper-large-v3 model on GPU | Implementation | Precision | Beam size | Time | YT Commons WER | | --- | --- | --- | --- | --- | | transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 | | faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 | *GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.* [^1]: transformers OOM for any batch size > 1 ### Small model on CPU | Implementation | Precision | Beam size | Time | RAM Usage | | --- | --- | --- | --- | --- | | openai/whisper | fp32 | 5 | 6m58s | 2335MB | | whisper.cpp | fp32 | 5 | 2m05s | 1049MB | | whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB | | faster-whisper | fp32 | 5 | 2m37s | 2257MB | | faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB | | faster-whisper | int8 | 5 | 1m42s | 1477MB | | faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB | *Executed with 8 threads on an Intel Core i7-12700K.* ## Requirements * Python 3.9 or greater Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package. ### GPU GPU execution requires the following NVIDIA libraries to be installed: * [cuBLAS for CUDA 12](https://developer.nvidia.com/cublas) * [cuDNN 9 for CUDA 12](https://developer.nvidia.com/cudnn) **Note**: The latest versions of `ctranslate2` only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the `3.24.0` version of `ctranslate2`, for CUDA 12 and cuDNN 8, downgrade to the `4.4.0` version of `ctranslate2`, (This can be done with `pip install --force-reinstall ctranslate2==4.4.0` or specifying the version in a `requirements.txt`). There are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below. <details> <summary>Other installation methods (click to expand)</summary> **Note:** For all these methods below, keep in mind the above note regarding CUDA versions. Depending on your setup, you may need to install the _CUDA 11_ versions of libraries that correspond to the CUDA 12 libraries listed in the instructions below. #### Use Docker The libraries (cuBLAS, cuDNN) are installed in this official NVIDIA CUDA Docker images: `nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04`. #### Install with `pip` (Linux only) On Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python. ```bash pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.* export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'` ``` #### Download the libraries from Purfview's repository (Windows & Linux) Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provid

Release History

VersionChangesUrgencyDate
1.2.1Imported from PyPI (1.2.1)Low4/21/2026
v1.2.1## What's Changed * only merge when `clip_timestamps` are not provided by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1345 * Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline by @mmichelli in https://github.com/SYSTRAN/faster-whisper/pull/1338 * Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in https://github.com/SYSTRAN/faster-whisper/pull/1373 * Offload retry logic to hf hub by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pLow10/31/2025
v1.2.0## What's Changed * feat: allow passing specific revision to download by @felixmosh in https://github.com/SYSTRAN/faster-whisper/pull/1292 * Support `distil-large-v3.5` by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1311 * Feature: Allow loading of private HF models by @r15hil in https://github.com/SYSTRAN/faster-whisper/pull/1309 * bugfix: Get correct chunk index when restoring timestamps by @MahmoudAshraf97 in https://github.com/SYSTRAN/faster-whisper/pull/1336 * ReLow8/6/2025
v1.1.1## What's Changed * Brings back original VAD parameters naming by @Purfview in https://github.com/SYSTRAN/faster-whisper/pull/1181 * Make batched `suppress_tokens` behaviour same as in sequential by @Purfview in https://github.com/SYSTRAN/faster-whisper/pull/1194 * Fixes OOM Errors - too high RAM usage by VAD by @Purfview in https://github.com/SYSTRAN/faster-whisper/pull/1198 * Add duration of audio and VAD removed duration to `BatchedInferencePipeline` by @greenw0lf in https://github.com/SYLow1/1/2025
v1.1.0## New Features * New batched inference that is 4x faster and accurate, Refer to [README](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0?tab=readme-ov-file#batched-transcription) on usage instructions. * Support for the new `large-v3-turbo` model. * VAD filter is now 3x faster on CPU. * Feature Extraction is now 3x faster. * Added `log_progress` to `WhisperModel.transcribe` to print transcription progress. * Added `multilingual` option to transcription to allow transcribing multilinLow11/21/2024
v1.0.3## Upgrade Silero-Vad model to latest V5 version (https://github.com/SYSTRAN/faster-whisper/pull/884) Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0 - window_size_samples parameter is fixed at 512. - Change to use the state variable instead of the existing h and c variables. - Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk. - Change the dimensions of the state variable from 64 to 128. - ReLow7/1/2024
v1.0.2* Add support for distil-large-v3 (https://github.com/SYSTRAN/faster-whisper/pull/755) The latest Distil-Whisper model, [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3-ct2), is intrinsically designed to work with the OpenAI sequential algorithm. * Benchmarks (https://github.com/SYSTRAN/faster-whisper/pull/773) Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper. * Support initializing more whisper modelLow5/6/2024
v1.0.1* Bug fixes and performance improvements: * Update logic to get segment from features before encoding (https://github.com/SYSTRAN/faster-whisper/pull/705) * Fix window end heuristic for hallucination_silence_threshold (https://github.com/SYSTRAN/faster-whisper/pull/706)Low3/1/2024
v0.10.1Fix the broken tag [v0.10.0](https://github.com/SYSTRAN/faster-whisper/releases/tag/v0.10.0)Low2/22/2024
v0.10.0* Support "large-v3" model with * The ability to load `feature_size/num_mels` and other from `preprocessor_config.json` * A new language token for Cantonese (`yue`) * Update `CTranslate2` requirement to include the latest version 3.22.0 * Update `tokenizers` requirement to include the latest version 0.15 * Change the hub to fetch models from [Systran organization](https://huggingface.co/Systran) Low2/22/2024
v1.0.0* Support distil-whisper model (https://github.com/SYSTRAN/faster-whisper/pull/557) Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling. For more detail: https://github.com/huggingface/distil-whisper * Upgrade ctranslate2 version to 4.0 to support CUDA 12 (https://github.com/SYSTRAN/faster-whisper/pull/694) * Upgrade PyAV version to 11.* to support Python3.12.x (https://github.com/SYSTRAN/faster-whisper/pull/679) * Small bug fixes * IllogiLow2/22/2024
v0.9.0* Add function `faster_whisper.available_models()` to list the available model sizes * Add model property `supported_languages` to list the languages accepted by the model * Improve error message for invalid `task` and `language` parameters * Update `tokenizers` requirement to include the latest version 0.14Low9/18/2023
v0.8.0## Expose new transcription options Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper: * `repetition_penalty` to penalize the score of previously generated tokens (set > 1 to penalize) * `no_repeat_ngram_size` to prevent repetitions of ngrams with this size Some values that were previously hardcoded in the transcription method: * `prompt_reset_on_temperature` to configure after which temperature fallback step the prompt with the Low9/4/2023
v0.7.1* Fix a bug related to `no_speech_threshold`: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech * Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probabilityLow7/24/2023
v0.7.0## Improve word-level timestamps heuristics Some recent improvements from openai-whisper are ported to faster-whisper: * Squash long words at window and sentence boundaries (https://github.com/openai/whisper/commit/255887f219e6b632bc1a6aac1caf28eecfca1bac) * Improve timestamp heuristics (https://github.com/openai/whisper/commit/f572f2161ba831bae131364c3bffdead7af6d210) ## Support download of user converted models from the Hugging Face Hub The `WhisperModel` constructor now accepts aLow7/18/2023
v0.6.0## Extend `TranscriptionInfo` with additional properties * `all_language_probs`: the probability of each language (only set when `language=None`) * `vad_options`: the VAD options that were used for this transcription ## Improve robustness on temporary connection issues to the Hugging Face Hub When the model is loaded from its name like `WhisperModel("large-v2")`, a request is made to the Hugging Face Hub to check if some files should be downloaded. It can happen that this request raLow5/24/2023
v0.5.1Fix `download_root` to correctly set the cache directory where the models are downloaded.Low4/26/2023
v0.5.0## Improved logging Some information are now logged under `INFO` and `DEBUG` levels. The logging level can be configured like this: ```python import logging logging.basicConfig() logging.getLogger("faster_whisper").setLevel(logging.DEBUG) ``` ## More control over model downloads New arguments were added to the `WhisperModel` constructor to better control how the models are downloaded: * `download_root` to specify where the model should be downloaded. * `local_files_only` toLow4/25/2023
v0.4.1Fix some `IndexError` exceptions: * when VAD is enabled and a predicted timestamp is after the last speech chunk * when word timestamps are enabled and the model predicts a tokens sequence that is decoded to invalid Unicode charactersLow4/4/2023
v0.4.0## Integration of Silero VAD The [Silero VAD](https://github.com/snakers4/silero-vad) model is integrated to ignore parts of the audio without speech: ```python model.transcribe(..., vad_filter=True) ``` The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters. **Note:** the Silero model is executed with `onnxruntime` which is currently not released for Python 3.11. The dependency is excluded forLow4/3/2023
v0.3.0* Converted models are now available on the [Hugging Face Hub](https://huggingface.co/guillaumekln) and are automatically downloaded when creating a `WhisperModel` instance. The conversion step is no longer required for the original Whisper models. ```python # Automatically download https://huggingface.co/guillaumekln/faster-whisper-large-v2 model = WhisperModel("large-v2") ``` * Run the encoder only once for each 30-second window. Before this change the same window could be encoded mulLow3/24/2023
v0.2.0Initial publication of the library on PyPI: https://pypi.org/project/faster-whisper/ Low3/22/2023

Dependencies & License Audit

Loading dependencies...

Similar Packages

ctranslate2Fast inference engine for Transformer models4.7.1
transformersTransformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.5.5.4
pre-commitA framework for managing and maintaining multi-language pre-commit hooks.v4.6.0
azure-core-tracing-opentelemetryMicrosoft Azure Azure Core OpenTelemetry plugin Library for Pythonazure-template_0.1.0b6187637
spdx-toolsSPDX parser and tools.0.8.5