Login

CTranslate2-based optimized implementation of Whisper speech recognition, 4x faster and less memory usage than the original

MITPython 17.0kSYSTRANfaster-whisper Last Updated: 2025-06-02

Faster-Whisper Project Details

Project Overview

Faster-Whisper is a re-implementation of the OpenAI Whisper model, utilizing CTranslate2 as a fast inference engine. Compared to the original openai/whisper, it achieves up to 4x faster inference while maintaining the same accuracy and consuming less memory. Further efficiency gains can be achieved on both CPU and GPU through 8-bit quantization.

Core Features

🚀 Performance Advantages

  • Speed Boost: Up to 4x faster than the original Whisper
  • Memory Optimization: Lower memory footprint
  • Quantization Support: Supports 8-bit quantization for further performance enhancement
  • Batch Processing: Supports batch transcription to improve throughput

🛠️ Technical Features

  • Based on the CTranslate2 inference engine
  • Supports GPU and CPU execution
  • Compatible with original Whisper models
  • Supports multiple precision modes (FP16, FP32, INT8)
  • Built-in audio decoding (no FFmpeg required)

Performance Comparison

GPU Benchmark (NVIDIA RTX 3070 Ti 8GB)

Performance comparison for transcribing 13 minutes of audio:

Implementation Precision Beam Size Time VRAM Usage
openai/whisper fp16 5 2m23s 4708MB
whisper.cpp (Flash Attention) fp16 5 1m05s 4127MB
faster-whisper fp16 5 1m47s 3244MB
faster-whisper int8 5 1m33s 2926MB

CPU Benchmark (Intel Core i7-12700K)

Implementation Precision Beam Size Time RAM Usage
openai/whisper fp32 5 6m58s 2335MB
whisper.cpp fp32 5 2m05s 1049MB
faster-whisper fp32 5 2m37s 2257MB
faster-whisper int8 5 1m42s 1477MB

Installation Instructions

System Requirements

  • Python 3.9 or higher
  • NVIDIA CUDA library support required for GPU execution

Basic Installation

pip install faster-whisper

GPU Support Installation

NVIDIA libraries are required:

  • CUDA 12.x
  • cuDNN 9.x
  • cuBLAS
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Docker Method

# Use official NVIDIA CUDA image
docker run --gpus all -it nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04

Usage Guide

Basic Transcription

from faster_whisper import WhisperModel

model_size = "large-v3"

# GPU execution (FP16)
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# Or GPU execution (INT8)
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")

# Or CPU execution (INT8)
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Batch Transcription

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("turbo", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)

segments, info = batched_model.transcribe("audio.mp3", batch_size=16)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Word-level Timestamps

segments, _ = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

Voice Activity Detection (VAD)

# Enable VAD filtering
segments, _ = model.transcribe("audio.mp3", vad_filter=True)

# Custom VAD parameters
segments, _ = model.transcribe(
    "audio.mp3",
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500),
)

Distil-Whisper Support

from faster_whisper import WhisperModel

model_size = "distil-large-v3"
model = WhisperModel(model_size, device="cuda", compute_type="float16")

segments, info = model.transcribe(
    "audio.mp3", 
    beam_size=5, 
    language="en", 
    condition_on_previous_text=False
)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Model Conversion

From Transformers

pip install transformers[torch]>=4.23

ct2-transformers-converter \
    --model openai/whisper-large-v3 \
    --output_dir whisper-large-v3-ct2 \
    --copy_files tokenizer.json preprocessor_config.json \
    --quantization float16

Loading Custom Models

# Load from local directory
model = WhisperModel("whisper-large-v3-ct2")

# Load from Hugging Face Hub
model = WhisperModel("username/whisper-large-v3-ct2")

Application Scenarios

  • Speech-to-text
  • Real-time transcription
  • Subtitle generation
  • Multilingual translation
  • Speech analysis
  • Audio content indexing

Configuration and Optimization

Logging Configuration

import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

Performance Optimization Suggestions

  • Use GPU acceleration for optimal performance
  • Select appropriate precision mode based on hardware
  • Utilize batch processing to improve throughput
  • Enable VAD filtering to reduce processing time
  • Properly set beam size and batch size

Thread Configuration

# Set CPU thread count
OMP_NUM_THREADS=4 python3 my_script.py

Technical Architecture

Core Components

  • CTranslate2: Fast inference engine
  • PyAV: Audio decoding library
  • Silero VAD: Voice Activity Detection
  • Transformers: Model conversion support

Supported Models

  • OpenAI Whisper series (tiny, base, small, medium, large-v1/v2/v3)
  • Distil-Whisper series
  • Custom fine-tuned models

Community and Support

Summary

Faster-Whisper is a high-performance speech recognition solution that achieves significant speed improvements through an optimized inference engine while maintaining the same accuracy as the original Whisper. Its rich features, robust ecosystem support, and user-friendly API make it an ideal choice for speech recognition applications. Whether for developers or researchers, Faster-Whisper enables the rapid development of efficient speech processing applications.

Star History Chart