Home
Login

The first production-grade open-source text-to-speech model with support for emotional exaggeration control and zero-shot voice synthesis.

MITPython 8.7kresemble-aichatterbox Last Updated: 2025-06-13

Chatterbox - Open Source Text-to-Speech Model

Project Overview

Chatterbox is the first production-grade open-source Text-to-Speech (TTS) model developed by Resemble AI. Released under the MIT license, this project is a groundbreaking speech synthesis solution that excels in multiple benchmarks, consistently outperforming leading closed-source systems like ElevenLabs in side-by-side evaluations.

Core Features

🎯 Technical Advantages

  • State-of-the-art Zero-Shot TTS Technology: Generates high-quality speech without training.
  • 500 Million Parameter Llama Backbone: Robust model architecture ensures generation quality.
  • Unique Emotional Exaggeration/Intensity Control: Industry's first open-source TTS model supporting emotional control.
  • Ultra-Stable Alignment-Aware Inference: Ensures the stability and consistency of generated speech.
  • Large-Scale Training Data: Trained on 500,000 hours of clean data.
  • Built-in Watermarking: All generated audio contains Perth perceptual threshold watermarks.

🚀 Performance

  • Outperforms ElevenLabs: Performs better in comparative tests on the Podonos platform.
  • Low Latency: Commercial version supports ultra-low latency below 200ms.
  • High-Quality Synthesis: Trained on large-scale clean data, ensuring output quality.

Application Scenarios

Chatterbox is suitable for various application scenarios:

  • Content Creation: Meme creation, video dubbing.
  • Game Development: Character voices, game narration.
  • AI Agents: Intelligent assistants, chatbots.
  • Interactive Media: Interactive applications, educational content.
  • Voice Conversion: Voice style transfer.

Installation and Usage

Quick Installation

pip install chatterbox-tts

Basic Usage Example

import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

# Initialize the model
model = ChatterboxTTS.from_pretrained(device="cuda")

# Generate speech
text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
wav = model.generate(text)
ta.save("test-1.wav", wav, model.sr)

# Use audio prompt for voice cloning
AUDIO_PROMPT_PATH = "YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)

Parameter Tuning Guide

General Use (TTS and Voice Agents)

  • Default Settings: exaggeration=0.5, cfg=0.5 suitable for most prompts.
  • Fast Voice Style: If the reference speaker speaks quickly, reduce cfg to around 0.3 to improve rhythm.

Expressive or Dramatic Speech

  • Low CFG Value: Try a lower cfg value (e.g., ~0.3).
  • High Exaggeration: Increase exaggeration to around 0.7 or higher.
  • Speed Compensation: Higher exaggeration speeds up speech; lowering cfg helps compensate with a slower, more deliberate rhythm.

Technical Architecture

Model Architecture

  • Backbone Network: 500 million parameter model based on the Llama architecture.
  • Training Data: 500,000 hours of high-quality clean data.
  • Inference Optimization: Alignment-aware inference technology ensures stability.

Security Features

  • Built-in Watermark: Uses Resemble AI's Perth (Perceptual Threshold) watermarking technology.
  • Detection Accuracy: Watermark maintains nearly 100% detection accuracy after MP3 compression, audio editing, and common operations.
  • Transparency: Open-source model provides complete transparency and control.

Project Resources

Commercial Support

For users who need to scale or fine-tune for higher accuracy, Resemble AI offers competitively priced TTS services with the following features:

  • Reliable Performance: Stable production-grade service.
  • Ultra-Low Latency: Response time below 200ms.
  • Suitable Scenarios: Production use for agents, applications, or interactive media.

Usage Notice

This model should be used responsibly and not for malicious purposes. Training prompts are derived from freely available data on the internet.

Contribution and Community

As an open-source project, Chatterbox welcomes community contributions. Developers can participate in project development through GitHub, submit issue reports, or feature suggestions.

Star History Chart