Home
Login

Netflix-level video translation, localization, and dubbing tool, realizing AI subtitle cutting, translation, alignment, and dubbing with one click.

Apache-2.0Python 13.7kHuanshereVideoLingo Last Updated: 2025-05-18

VideoLingo - Netflix-Level AI Video Translation, Subtitling, and Dubbing Tool

🌟 Project Overview

VideoLingo is a comprehensive tool integrating video translation, localization, and dubbing functionalities, aiming to generate Netflix-level quality subtitles. This project eliminates awkward machine translations and multi-line subtitle issues, while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

🎯 Core Features

Key Features

  • 🎥 YouTube Video Download: Video download via yt-dlp
  • 🎙️ High-Precision Speech Recognition: Word-level and low-hallucination subtitle recognition using WhisperX
  • 📝 Intelligent Subtitle Segmentation: Subtitle segmentation based on NLP and AI technologies
  • 📚 Terminology Management: Custom + AI-generated glossary to ensure translation consistency
  • 🔄 Three-Step Translation Process: Movie-level quality processing with translation-reflection-adaptation
  • Netflix Standard Subtitles: Generates only single-line subtitles, compliant with Netflix standards
  • 🗣️ Multi-Engine Dubbing: Supports multiple dubbing engines such as GPT-SoVITS, Azure, OpenAI, etc.
  • 🚀 One-Click Launch: One-click launch and processing via Streamlit
  • 🌍 Multi-Language Interface: Streamlit UI supports multiple languages
  • 📝 Detailed Logs: Detailed log system supporting progress recovery

Differentiation from Similar Projects

  • Generates Only Single-Line Subtitles: Compliant with professional standards
  • Superior Translation Quality: Multi-step translation process ensures quality
  • Seamless Dubbing Experience: Multiple TTS engine options

🌍 Supported Languages

Input Language Support

  • 🇺🇸 English 🤩
  • 🇷🇺 Russian 😊
  • 🇫🇷 French 🤩
  • 🇩🇪 German 🤩
  • 🇮🇹 Italian 🤩
  • 🇪🇸 Spanish 🤩
  • 🇯🇵 Japanese 😐
  • 🇨🇳 Chinese* 😊

*Chinese uses a separate punctuation-enhanced whisper model

Translation supports all languages, and dubbing languages depend on the selected TTS method.

🔧 Installation Requirements

System Requirements

  • Python 3.10
  • FFmpeg
  • CUDA support (Windows NVIDIA GPU users)

Windows NVIDIA GPU Users Pre-Installation Steps

  1. Install CUDA Toolkit 12.6
  2. Install CUDNN 9.3.0
  3. Add C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6 to the system PATH
  4. Restart the computer

FFmpeg Installation

  • Windows: choco install ffmpeg (via Chocolatey)
  • macOS: brew install ffmpeg (via Homebrew)
  • Linux: sudo apt install ffmpeg (Debian/Ubuntu)

📥 Installation Steps

1. Clone the Repository

git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo

2. Install Dependencies (Requires python=3.10)

conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py

3. Launch the Application

streamlit run st.py

Docker Installation (Optional)

docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo

Requires CUDA 12.4 and NVIDIA driver version >550

🔌 API Support

VideoLingo supports OpenAI-Like API format and various TTS interfaces:

LLM Support

  • claude-3-5-sonnet
  • gpt-4.1
  • deepseek-v3
  • gemini-2.0-flash
  • ... (Ordered by performance, use gemini-2.5-flash with caution)

WhisperX Options

  • Run whisperX locally (large-v3)
  • Use 302.ai API

TTS Engines

  • azure-tts
  • openai-tts
  • siliconflow-fishtts
  • fish-tts
  • GPT-SoVITS
  • edge-tts
  • *custom-tts (Can modify custom TTS in custom_tts.py)

Convenient Options

  • Use 302.ai one API key to access all services (LLM, WhisperX, TTS)
  • Run Ollama and Edge-TTS locally completely free, no API required

⚠️ Known Limitations

  1. Audio Quality Impact: WhisperX transcription performance may be affected by video background noise. For videos with significant background music, enable vocal separation enhancement.

  2. Numeric Character Handling: Subtitles ending with numbers or special characters may be truncated early because wav2vac cannot map numeric characters (e.g., "1") to their spoken form (e.g., "one").

  3. Model Compatibility: Using weaker models may cause errors during processing due to strict JSON format requirements.

  4. Dubbing Perfection: Due to differences in speech rate and intonation between languages, as well as the impact of translation steps, the dubbing function may not be 100% perfect.

  5. Multi-Language Recognition: Multi-language video transcription recognition will only retain the primary language.

  6. Multi-Character Dubbing: Currently, it is not possible to dub multiple characters separately because whisperX's speaker diarization capability is not reliable enough.

Star History Chart