KoljaB/RealtimeTTSPlease refer to the latest official releases for information GitHub Homepage

An advanced real-time text-to-speech Python library that supports multiple TTS engines, featuring low latency and high-quality audio output.

MITPython 3.2kKoljaBRealtimeTTS Last Updated: 2025-06-17

RealtimeTTS Project Detailed Introduction

Project Overview

RealtimeTTS is an advanced real-time Text-to-Speech (TTS) Python library, specifically designed for real-time applications that require low latency and high-quality audio output. This library can quickly convert text streams into high-quality audio output with minimal delay, making it ideal for building voice assistants, AI dialogue systems, and accessibility tools.

Project Address: https://github.com/KoljaB/RealtimeTTS

Core Features

1. Low Latency Processing

Near-instantaneous text-to-speech conversion: Optimized processing flow ensures minimal latency.
LLM Output Compatibility: Can directly process streaming output from Large Language Models.
Real-time Stream Processing: Supports real-time processing at both character and sentence levels.

2. High-Quality Audio Output

Clear and Natural Voice: Generates natural, human-like speech.
Multiple Audio Format Support: Supports various audio output formats.
Configurable Audio Parameters: Adjustable parameters such as sample rate and bit rate.

3. Multi-Engine Support

RealtimeTTS supports multiple TTS engines, providing a wide range of choices:

Cloud Engines 🌐

OpenAIEngine: OpenAI's TTS service, offering 6 high-quality voices.
AzureEngine: Microsoft Azure Speech Service, with 500,000 free characters per month.
ElevenlabsEngine: High-end voice quality, providing a rich selection of voices.
GTTSEngine: Free Google Translate TTS, no GPU required.
EdgeEngine: Microsoft Edge free TTS service.

Local Engines 🏠

CoquiEngine: High-quality neural TTS, supports local processing and voice cloning.
ParlerEngine: Local neural TTS, suitable for high-end GPUs.
SystemEngine: Built-in system TTS, quick setup.
PiperEngine: Extremely fast TTS system, even runs on Raspberry Pi.
StyleTTS2Engine: Stylized speech synthesis.
KokoroEngine: New engine with multilingual support.
OrpheusEngine: Newly added engine option.

4. Multilingual Support

Supports speech synthesis in multiple languages.
Intelligent sentence segmentation and language detection.
Configurable language-specific parameters.

5. Robustness and Reliability

Failover Mechanism: Automatically switches to a backup engine when one engine encounters a problem.
Continuous Operation Assurance: Ensures consistent performance and reliability for critical and professional use cases.
Error Handling: Comprehensive error handling and recovery mechanisms.

Installation

Recommended Installation (Full Version)

pip install -U realtimetts[all]

Custom Installation

You can choose specific engine support as needed:

# System TTS only
pip install realtimetts[system]

# Azure support
pip install realtimetts[azure]

# Multi-engine combination
pip install realtimetts[azure,elevenlabs,openai]

Available Installation Options

all: Full installation, supports all engines.
system: Local system TTS (pyttsx3).
azure: Azure Speech Service support.
elevenlabs: ElevenLabs API integration.
openai: OpenAI TTS service.
gtts: Google Text-to-Speech.
edge: Microsoft Edge TTS.
coqui: Coqui TTS engine.
minimal: Core package only (for custom engine development).

Core Components

1. Text Stream Processing

Sentence Boundary Detection: Supports NLTK and Stanza tokenizers.
Intelligent Segmentation: Segments text based on punctuation and language rules.
Stream Processing: Supports character iterators and generators.

2. Audio Stream Management

Asynchronous Playback: play_async() method supports non-blocking playback.
Synchronous Playback: play() method for blocking playback.
Stream Control: Supports pause, resume, and stop operations.

3. Callback System

Provides rich callback functions for monitoring and control:

on_text_stream_start(): Triggered when the text stream starts.
on_text_stream_stop(): Triggered when the text stream ends.
on_audio_stream_start(): Triggered when audio playback starts.
on_audio_stream_stop(): Triggered when audio playback ends.
on_character(): Triggered when each character is processed.
on_word(): Word-level time synchronization (supports Azure and Kokoro engines).

Basic Usage Examples

Simple Usage

from RealtimeTTS import TextToAudioStream, SystemEngine

# Create engine and stream
engine = SystemEngine()
stream = TextToAudioStream(engine)

# Input text and play
stream.feed("Hello world! How are you today?")
stream.play_async()

Streaming Text Processing

# Process a string
stream.feed("Hello, this is a sentence.")

# Process a generator (suitable for LLM output)
def write(prompt: str):
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
            yield text_chunk

text_stream = write("A three-sentence relaxing speech.")
stream.feed(text_stream)

# Process a character iterator
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)

Playback Control

# Asynchronous playback
stream.play_async()
while stream.is_playing():
    time.sleep(0.1)

# Synchronous playback
stream.play()

# Control operations
stream.pause()   # Pause
stream.resume()  # Resume
stream.stop()    # Stop

Advanced Configuration

TextToAudioStream Parameters

stream = TextToAudioStream(
    engine=engine,                    # TTS engine
    on_text_stream_start=callback,    # Text stream start callback
    on_audio_stream_start=callback,   # Audio stream start callback
    output_device_index=None,         # Audio output device
    tokenizer="nltk",                # Tokenizer selection
    language="en",                   # Language code
    muted=False,                     # Whether to mute
    level=logging.WARNING            # Log level
)

Playback Parameters

stream.play(
    fast_sentence_fragment=True,      # Fast sentence fragment processing
    buffer_threshold_seconds=0.0,     # Buffer threshold
    minimum_sentence_length=10,       # Minimum sentence length
    log_synthesized_text=False,       # Log synthesized text
    reset_generated_text=True,        # Reset generated text
    output_wavfile=None,             # Save to WAV file
    on_sentence_synthesized=callback, # Sentence synthesis complete callback
    before_sentence_synthesized=callback, # Before sentence synthesis callback
    on_audio_chunk=callback          # Audio chunk ready callback
)

Engine-Specific Configuration

OpenAI Engine

from RealtimeTTS import OpenAIEngine

engine = OpenAIEngine(
    api_key="your-api-key",  # Or set environment variable OPENAI_API_KEY
    voice="alloy",           # Optional: alloy, echo, fable, onyx, nova, shimmer
    model="tts-1"           # Or tts-1-hd
)

Azure Engine

from RealtimeTTS import AzureEngine

engine = AzureEngine(
    speech_key="your-speech-key",    # Or set environment variable AZURE_SPEECH_KEY
    service_region="your-region",    # For example: "eastus"
    voice_name="en-US-AriaNeural"   # Azure voice name
)

Coqui Engine (Voice Cloning)

from RealtimeTTS import CoquiEngine

engine = CoquiEngine(
    voice="path/to/voice/sample.wav",  # Voice cloning source file
    language="en"                      # Language code
)

Test Files

The project provides a rich set of test examples:

simple_test.py: Basic "Hello World" demonstration.
complex_test.py: Full-featured demonstration.
coqui_test.py: Local Coqui TTS engine test.
translator.py: Real-time multilingual translation (requires installation of openai realtimetts).
openai_voice_interface.py: Voice-activated OpenAI API interface.
advanced_talk.py: Advanced dialogue system.
minimalistic_talkbot.py: Simple chatbot in 20 lines of code.
test_callbacks.py: Callback functionality and latency testing.

CUDA Support

For better performance, especially when using local neural engines, it is recommended to install CUDA support:

Installation Steps

Install NVIDIA CUDA Toolkit (version 11.8 or 12.X).
Install NVIDIA cuDNN.
Install ffmpeg.
Install CUDA-enabled PyTorch:

# CUDA 11.8
pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.X
pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

Application Scenarios

1. AI Assistants and Chatbots

Real-time response to user queries.
Natural conversational experience.
Multilingual support.

2. Accessibility Tools

Screen readers.
Visual impairment assistance.
Learning aids.

3. Content Creation

Podcast production.
Audiobooks.
Educational content.

4. Customer Service

Automated customer service systems.
Telephone robots.
Real-time translation services.

5. Games and Entertainment

In-game voice.
Virtual character voice acting.
Interactive entertainment applications.

Project Ecosystem

RealtimeTTS is part of a larger ecosystem:

RealtimeSTT: A complementary speech-to-text library, which, when combined, can create a complete real-time audio processing system.
Linguflex: The original project, a powerful open-source AI assistant.
LocalAIVoiceChat: A local AI voice dialogue system based on the Zephyr 7B model.

License Information

The project itself is open source, but note the license restrictions of each engine:

Open Source Engines: SystemEngine, GTTSEngine (MIT License).
Commercially Restricted Engines: CoquiEngine, ElevenlabsEngine, AzureEngine (free for non-commercial use).
Paid Services: OpenAI requires an API key and a paid plan.

System Requirements

Python Version: >= 3.9, < 3.13
Operating System: Windows, macOS, Linux
Dependencies: PyAudio, pyttsx3, pydub, etc.
GPU Support: NVIDIA graphics card recommended for local neural engines.

Summary

RealtimeTTS is a powerful and well-designed real-time text-to-speech library, suitable for modern applications that require high-quality, low-latency speech synthesis. Its multi-engine support, robust error handling mechanisms, and rich configuration options make it an ideal choice for building professional-grade voice applications. Whether for personal projects or enterprise-level applications, RealtimeTTS provides a reliable and efficient solution.