Dia is a 1.6 billion parameter text-to-speech (TTS) model developed by Nari Labs, specifically designed to generate highly realistic dialogue directly from text scripts. Unlike traditional TTS models, Dia focuses on multi-speaker dialogue scenarios, capable of capturing the natural flow and interactive characteristics of conversations.
The project is licensed under the Apache 2.0 open-source license, aiming to accelerate the development of speech synthesis research and provide researchers, developers, and content creators with a powerful tool.
Precision Type | Compiled Real-Time Multiple | Uncompiled Real-Time Multiple | Memory Usage |
---|---|---|---|
bfloat16 | x2.1 | x1.5 | ~10GB |
float16 | x2.2 | x1.3 | ~10GB |
float32 | x1 | x0.9 | ~13GB |
Dia represents a significant breakthrough in open-source TTS technology, particularly in the field of dialogue generation. It not only offers quality comparable to commercial solutions (such as ElevenLabs) but also boasts the advantages of being fully open-source and deployable locally. For researchers and developers who require high-quality speech synthesis capabilities, Dia provides a powerful and flexible solution.