Netflix-level video translation, localization, and dubbing tool, realizing AI subtitle cutting, translation, alignment, and dubbing with one click.
VideoLingo - Netflix-Level AI Video Translation, Subtitling, and Dubbing Tool
🌟 Project Overview
VideoLingo is a comprehensive tool integrating video translation, localization, and dubbing functionalities, aiming to generate Netflix-level quality subtitles. This project eliminates awkward machine translations and multi-line subtitle issues, while adding high-quality dubbing, enabling global knowledge sharing across language barriers.
🎯 Core Features
Key Features
- 🎥 YouTube Video Download: Video download via yt-dlp
- 🎙️ High-Precision Speech Recognition: Word-level and low-hallucination subtitle recognition using WhisperX
- 📝 Intelligent Subtitle Segmentation: Subtitle segmentation based on NLP and AI technologies
- 📚 Terminology Management: Custom + AI-generated glossary to ensure translation consistency
- 🔄 Three-Step Translation Process: Movie-level quality processing with translation-reflection-adaptation
- ✅ Netflix Standard Subtitles: Generates only single-line subtitles, compliant with Netflix standards
- 🗣️ Multi-Engine Dubbing: Supports multiple dubbing engines such as GPT-SoVITS, Azure, OpenAI, etc.
- 🚀 One-Click Launch: One-click launch and processing via Streamlit
- 🌍 Multi-Language Interface: Streamlit UI supports multiple languages
- 📝 Detailed Logs: Detailed log system supporting progress recovery
Differentiation from Similar Projects
- Generates Only Single-Line Subtitles: Compliant with professional standards
- Superior Translation Quality: Multi-step translation process ensures quality
- Seamless Dubbing Experience: Multiple TTS engine options
🌍 Supported Languages
Input Language Support
- 🇺🇸 English 🤩
- 🇷🇺 Russian 😊
- 🇫🇷 French 🤩
- 🇩🇪 German 🤩
- 🇮🇹 Italian 🤩
- 🇪🇸 Spanish 🤩
- 🇯🇵 Japanese 😐
- 🇨🇳 Chinese* 😊
*Chinese uses a separate punctuation-enhanced whisper model
Translation supports all languages, and dubbing languages depend on the selected TTS method.
🔧 Installation Requirements
System Requirements
- Python 3.10
- FFmpeg
- CUDA support (Windows NVIDIA GPU users)
Windows NVIDIA GPU Users Pre-Installation Steps
- Install CUDA Toolkit 12.6
- Install CUDNN 9.3.0
- Add
C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6
to the system PATH - Restart the computer
FFmpeg Installation
- Windows:
choco install ffmpeg
(via Chocolatey) - macOS:
brew install ffmpeg
(via Homebrew) - Linux:
sudo apt install ffmpeg
(Debian/Ubuntu)
📥 Installation Steps
1. Clone the Repository
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo
2. Install Dependencies (Requires python=3.10)
conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py
3. Launch the Application
streamlit run st.py
Docker Installation (Optional)
docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo
Requires CUDA 12.4 and NVIDIA driver version >550
🔌 API Support
VideoLingo supports OpenAI-Like API format and various TTS interfaces:
LLM Support
claude-3-5-sonnet
gpt-4.1
deepseek-v3
gemini-2.0-flash
- ... (Ordered by performance, use gemini-2.5-flash with caution)
WhisperX Options
- Run whisperX locally (large-v3)
- Use 302.ai API
TTS Engines
azure-tts
openai-tts
siliconflow-fishtts
fish-tts
GPT-SoVITS
edge-tts
*custom-tts
(Can modify custom TTS in custom_tts.py)
Convenient Options
- Use 302.ai one API key to access all services (LLM, WhisperX, TTS)
- Run Ollama and Edge-TTS locally completely free, no API required
⚠️ Known Limitations
Audio Quality Impact: WhisperX transcription performance may be affected by video background noise. For videos with significant background music, enable vocal separation enhancement.
Numeric Character Handling: Subtitles ending with numbers or special characters may be truncated early because wav2vac cannot map numeric characters (e.g., "1") to their spoken form (e.g., "one").
Model Compatibility: Using weaker models may cause errors during processing due to strict JSON format requirements.
Dubbing Perfection: Due to differences in speech rate and intonation between languages, as well as the impact of translation steps, the dubbing function may not be 100% perfect.
Multi-Language Recognition: Multi-language video transcription recognition will only retain the primary language.
Multi-Character Dubbing: Currently, it is not possible to dub multiple characters separately because whisperX's speaker diarization capability is not reliable enough.