SYSTRAN/faster-whisperPlease refer to the latest official releases for information GitHub Homepage

CTranslate2-based optimized implementation of Whisper speech recognition, 4x faster and less memory usage than the original

MITPython 17.0kSYSTRANfaster-whisper Last Updated: 2025-06-02

Faster-Whisper Project Details

Project Overview

Faster-Whisper is a re-implementation of the OpenAI Whisper model, utilizing CTranslate2 as a fast inference engine. Compared to the original openai/whisper, it achieves up to 4x faster inference while maintaining the same accuracy and consuming less memory. Further efficiency gains can be achieved on both CPU and GPU through 8-bit quantization.

Core Features

🚀 Performance Advantages

Speed Boost: Up to 4x faster than the original Whisper
Memory Optimization: Lower memory footprint
Quantization Support: Supports 8-bit quantization for further performance enhancement
Batch Processing: Supports batch transcription to improve throughput

🛠️ Technical Features

Based on the CTranslate2 inference engine
Supports GPU and CPU execution
Compatible with original Whisper models
Supports multiple precision modes (FP16, FP32, INT8)
Built-in audio decoding (no FFmpeg required)

Performance Comparison

GPU Benchmark (NVIDIA RTX 3070 Ti 8GB)

Performance comparison for transcribing 13 minutes of audio:

Implementation	Precision	Beam Size	Time	VRAM Usage
openai/whisper	fp16	5	2m23s	4708MB
whisper.cpp (Flash Attention)	fp16	5	1m05s	4127MB
faster-whisper	fp16	5	1m47s	3244MB
faster-whisper	int8	5	1m33s	2926MB

CPU Benchmark (Intel Core i7-12700K)

Implementation	Precision	Beam Size	Time	RAM Usage
openai/whisper	fp32	5	6m58s	2335MB
whisper.cpp	fp32	5	2m05s	1049MB
faster-whisper	fp32	5	2m37s	2257MB
faster-whisper	int8	5	1m42s	1477MB

Installation Instructions

System Requirements

Python 3.9 or higher
NVIDIA CUDA library support required for GPU execution

Basic Installation

pip install faster-whisper

GPU Support Installation

NVIDIA libraries are required:

CUDA 12.x
cuDNN 9.x
cuBLAS

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Docker Method

# Use official NVIDIA CUDA image
docker run --gpus all -it nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04

Usage Guide

Basic Transcription

from faster_whisper import WhisperModel

model_size = "large-v3"

# GPU execution (FP16)
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# Or GPU execution (INT8)
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")

# Or CPU execution (INT8)
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Batch Transcription

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("turbo", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)

segments, info = batched_model.transcribe("audio.mp3", batch_size=16)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Word-level Timestamps

segments, _ = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

Voice Activity Detection (VAD)

# Enable VAD filtering
segments, _ = model.transcribe("audio.mp3", vad_filter=True)

# Custom VAD parameters
segments, _ = model.transcribe(
    "audio.mp3",
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500),
)

Distil-Whisper Support

from faster_whisper import WhisperModel

model_size = "distil-large-v3"
model = WhisperModel(model_size, device="cuda", compute_type="float16")

segments, info = model.transcribe(
    "audio.mp3", 
    beam_size=5, 
    language="en", 
    condition_on_previous_text=False
)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Model Conversion

From Transformers

pip install transformers[torch]>=4.23

ct2-transformers-converter \
    --model openai/whisper-large-v3 \
    --output_dir whisper-large-v3-ct2 \
    --copy_files tokenizer.json preprocessor_config.json \
    --quantization float16

Loading Custom Models

# Load from local directory
model = WhisperModel("whisper-large-v3-ct2")

# Load from Hugging Face Hub
model = WhisperModel("username/whisper-large-v3-ct2")

Application Scenarios

Speech-to-text
Real-time transcription
Subtitle generation
Multilingual translation
Speech analysis
Audio content indexing

Configuration and Optimization

Logging Configuration

import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

Performance Optimization Suggestions

Use GPU acceleration for optimal performance
Select appropriate precision mode based on hardware
Utilize batch processing to improve throughput
Enable VAD filtering to reduce processing time
Properly set beam size and batch size

Thread Configuration

# Set CPU thread count
OMP_NUM_THREADS=4 python3 my_script.py

Technical Architecture

Core Components

CTranslate2: Fast inference engine
PyAV: Audio decoding library
Silero VAD: Voice Activity Detection
Transformers: Model conversion support

Supported Models

OpenAI Whisper series (tiny, base, small, medium, large-v1/v2/v3)
Distil-Whisper series
Custom fine-tuned models

Community and Support

GitHub Repository: https://github.com/SYSTRAN/faster-whisper
PyPI Package: https://pypi.org/project/faster-whisper/
Hugging Face Models: https://huggingface.co/Systran

Summary

Faster-Whisper is a high-performance speech recognition solution that achieves significant speed improvements through an optimized inference engine while maintaining the same accuracy as the original Whisper. Its rich features, robust ecosystem support, and user-friendly API make it an ideal choice for speech recognition applications. Whether for developers or researchers, Faster-Whisper enables the rapid development of efficient speech processing applications.