Huanshere/VideoLingoPlease refer to the latest official releases for information GitHub Homepage

Netflix-level video translation, localization, and dubbing tool, realizing AI subtitle cutting, translation, alignment, and dubbing with one click.

Apache-2.0Python 13.7kHuanshereVideoLingo Last Updated: 2025-05-18

VideoLingo - Netflix-Level AI Video Translation, Subtitling, and Dubbing Tool

🌟 Project Overview

VideoLingo is a comprehensive tool integrating video translation, localization, and dubbing functionalities, aiming to generate Netflix-level quality subtitles. This project eliminates awkward machine translations and multi-line subtitle issues, while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

🎯 Core Features

Key Features

🎥 YouTube Video Download: Video download via yt-dlp
🎙️ High-Precision Speech Recognition: Word-level and low-hallucination subtitle recognition using WhisperX
📝 Intelligent Subtitle Segmentation: Subtitle segmentation based on NLP and AI technologies
📚 Terminology Management: Custom + AI-generated glossary to ensure translation consistency
🔄 Three-Step Translation Process: Movie-level quality processing with translation-reflection-adaptation
✅ Netflix Standard Subtitles: Generates only single-line subtitles, compliant with Netflix standards
🗣️ Multi-Engine Dubbing: Supports multiple dubbing engines such as GPT-SoVITS, Azure, OpenAI, etc.
🚀 One-Click Launch: One-click launch and processing via Streamlit
🌍 Multi-Language Interface: Streamlit UI supports multiple languages
📝 Detailed Logs: Detailed log system supporting progress recovery

Differentiation from Similar Projects

Generates Only Single-Line Subtitles: Compliant with professional standards
Superior Translation Quality: Multi-step translation process ensures quality
Seamless Dubbing Experience: Multiple TTS engine options

🌍 Supported Languages

Input Language Support

🇺🇸 English 🤩
🇷🇺 Russian 😊
🇫🇷 French 🤩
🇩🇪 German 🤩
🇮🇹 Italian 🤩
🇪🇸 Spanish 🤩
🇯🇵 Japanese 😐
🇨🇳 Chinese* 😊

*Chinese uses a separate punctuation-enhanced whisper model

Translation supports all languages, and dubbing languages depend on the selected TTS method.

🔧 Installation Requirements

System Requirements

Python 3.10
FFmpeg
CUDA support (Windows NVIDIA GPU users)

Windows NVIDIA GPU Users Pre-Installation Steps

Install CUDA Toolkit 12.6
Install CUDNN 9.3.0
Add C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6 to the system PATH
Restart the computer

FFmpeg Installation

Windows: choco install ffmpeg (via Chocolatey)
macOS: brew install ffmpeg (via Homebrew)
Linux: sudo apt install ffmpeg (Debian/Ubuntu)

📥 Installation Steps

1. Clone the Repository

git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo

2. Install Dependencies (Requires python=3.10)

conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py

3. Launch the Application

streamlit run st.py

Docker Installation (Optional)

docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo

Requires CUDA 12.4 and NVIDIA driver version >550

🔌 API Support

VideoLingo supports OpenAI-Like API format and various TTS interfaces:

LLM Support

claude-3-5-sonnet
gpt-4.1
deepseek-v3
gemini-2.0-flash
... (Ordered by performance, use gemini-2.5-flash with caution)

WhisperX Options

Run whisperX locally (large-v3)
Use 302.ai API

TTS Engines

azure-tts
openai-tts
siliconflow-fishtts
fish-tts
GPT-SoVITS
edge-tts
*custom-tts (Can modify custom TTS in custom_tts.py)

Convenient Options

Use 302.ai one API key to access all services (LLM, WhisperX, TTS)
Run Ollama and Edge-TTS locally completely free, no API required

⚠️ Known Limitations

Audio Quality Impact: WhisperX transcription performance may be affected by video background noise. For videos with significant background music, enable vocal separation enhancement.
Numeric Character Handling: Subtitles ending with numbers or special characters may be truncated early because wav2vac cannot map numeric characters (e.g., "1") to their spoken form (e.g., "one").
Model Compatibility: Using weaker models may cause errors during processing due to strict JSON format requirements.
Dubbing Perfection: Due to differences in speech rate and intonation between languages, as well as the impact of translation steps, the dubbing function may not be 100% perfect.
Multi-Language Recognition: Multi-language video transcription recognition will only retain the primary language.
Multi-Character Dubbing: Currently, it is not possible to dub multiple characters separately because whisperX's speaker diarization capability is not reliable enough.