RealtimeTTS is an advanced real-time Text-to-Speech (TTS) Python library, specifically designed for real-time applications that require low latency and high-quality audio output. This library can quickly convert text streams into high-quality audio output with minimal delay, making it ideal for building voice assistants, AI dialogue systems, and accessibility tools.
Project Address: https://github.com/KoljaB/RealtimeTTS
RealtimeTTS supports multiple TTS engines, providing a wide range of choices:
pip install -U realtimetts[all]
You can choose specific engine support as needed:
# System TTS only
pip install realtimetts[system]
# Azure support
pip install realtimetts[azure]
# Multi-engine combination
pip install realtimetts[azure,elevenlabs,openai]
all
: Full installation, supports all engines.system
: Local system TTS (pyttsx3).azure
: Azure Speech Service support.elevenlabs
: ElevenLabs API integration.openai
: OpenAI TTS service.gtts
: Google Text-to-Speech.edge
: Microsoft Edge TTS.coqui
: Coqui TTS engine.minimal
: Core package only (for custom engine development).play_async()
method supports non-blocking playback.play()
method for blocking playback.Provides rich callback functions for monitoring and control:
on_text_stream_start()
: Triggered when the text stream starts.on_text_stream_stop()
: Triggered when the text stream ends.on_audio_stream_start()
: Triggered when audio playback starts.on_audio_stream_stop()
: Triggered when audio playback ends.on_character()
: Triggered when each character is processed.on_word()
: Word-level time synchronization (supports Azure and Kokoro engines).from RealtimeTTS import TextToAudioStream, SystemEngine
# Create engine and stream
engine = SystemEngine()
stream = TextToAudioStream(engine)
# Input text and play
stream.feed("Hello world! How are you today?")
stream.play_async()
# Process a string
stream.feed("Hello, this is a sentence.")
# Process a generator (suitable for LLM output)
def write(prompt: str):
for chunk in openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
stream=True
):
if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
yield text_chunk
text_stream = write("A three-sentence relaxing speech.")
stream.feed(text_stream)
# Process a character iterator
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)
# Asynchronous playback
stream.play_async()
while stream.is_playing():
time.sleep(0.1)
# Synchronous playback
stream.play()
# Control operations
stream.pause() # Pause
stream.resume() # Resume
stream.stop() # Stop
stream = TextToAudioStream(
engine=engine, # TTS engine
on_text_stream_start=callback, # Text stream start callback
on_audio_stream_start=callback, # Audio stream start callback
output_device_index=None, # Audio output device
tokenizer="nltk", # Tokenizer selection
language="en", # Language code
muted=False, # Whether to mute
level=logging.WARNING # Log level
)
stream.play(
fast_sentence_fragment=True, # Fast sentence fragment processing
buffer_threshold_seconds=0.0, # Buffer threshold
minimum_sentence_length=10, # Minimum sentence length
log_synthesized_text=False, # Log synthesized text
reset_generated_text=True, # Reset generated text
output_wavfile=None, # Save to WAV file
on_sentence_synthesized=callback, # Sentence synthesis complete callback
before_sentence_synthesized=callback, # Before sentence synthesis callback
on_audio_chunk=callback # Audio chunk ready callback
)
from RealtimeTTS import OpenAIEngine
engine = OpenAIEngine(
api_key="your-api-key", # Or set environment variable OPENAI_API_KEY
voice="alloy", # Optional: alloy, echo, fable, onyx, nova, shimmer
model="tts-1" # Or tts-1-hd
)
from RealtimeTTS import AzureEngine
engine = AzureEngine(
speech_key="your-speech-key", # Or set environment variable AZURE_SPEECH_KEY
service_region="your-region", # For example: "eastus"
voice_name="en-US-AriaNeural" # Azure voice name
)
from RealtimeTTS import CoquiEngine
engine = CoquiEngine(
voice="path/to/voice/sample.wav", # Voice cloning source file
language="en" # Language code
)
The project provides a rich set of test examples:
simple_test.py
: Basic "Hello World" demonstration.complex_test.py
: Full-featured demonstration.coqui_test.py
: Local Coqui TTS engine test.translator.py
: Real-time multilingual translation (requires installation of openai realtimetts
).openai_voice_interface.py
: Voice-activated OpenAI API interface.advanced_talk.py
: Advanced dialogue system.minimalistic_talkbot.py
: Simple chatbot in 20 lines of code.test_callbacks.py
: Callback functionality and latency testing.For better performance, especially when using local neural engines, it is recommended to install CUDA support:
# CUDA 11.8
pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.X
pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
RealtimeTTS is part of a larger ecosystem:
The project itself is open source, but note the license restrictions of each engine:
RealtimeTTS is a powerful and well-designed real-time text-to-speech library, suitable for modern applications that require high-quality, low-latency speech synthesis. Its multi-engine support, robust error handling mechanisms, and rich configuration options make it an ideal choice for building professional-grade voice applications. Whether for personal projects or enterprise-level applications, RealtimeTTS provides a reliable and efficient solution.