Spark-TTS is an advanced text-to-speech (TTS) system based on a large language model (LLM), developed by the SparkAudio team. The system employs innovative single-stream decoupled speech token technology to generate high-quality, natural speech synthesis results. Built upon the Qwen2.5 large language model, the project is designed for both research and production environments, featuring efficiency, flexibility, and power.
# Clone the repository
git clone https://github.com/SparkAudio/Spark-TTS.git
cd Spark-TTS
# Create a Conda environment
conda create -n sparktts -y python=3.12
conda activate sparktts
pip install -r requirements.txt
# Download via Python
from huggingface_hub import snapshot_download
snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")
The project explicitly stipulates usage guidelines:
Spark-TTS is a technologically advanced and powerful text-to-speech system, representing the cutting edge of current TTS technology. Through innovative architectural design and advanced deep learning techniques, it provides excellent voice quality and flexible control capabilities while maintaining high efficiency. The project is not only suitable for academic research but also has the potential for practical applications, making it an important contribution to the field of speech synthesis.