Meta's open-source multilingual speech recognition system supporting over 1,600 languages
Omnilingual ASR – Meta’s Open-Source Multilingual Speech Recognition System
Project Overview
Omnilingual ASR is a revolutionary open-source speech recognition system developed by Meta’s Fundamental AI Research (FAIR) team. The system supports speech recognition in over 1,600 languages, including hundreds previously unsupported by any ASR technology. What makes this project unique is that beyond its officially trained 1,600+ languages, it can extend via zero-shot in-context learning to support more than 5,400 languages—covering nearly all known spoken languages with written scripts.
Core Features
1. Unprecedented Language Coverage
- 1,600+ officially supported languages: Fully trained language support
- 5,400+ potentially supported languages: Extendable via zero-shot learning
- Low-resource language support: 78% of supported languages achieve Character Error Rate (CER) below 10%
- Includes Japanese support: Language code
jpn_Jpan
2. Open-Source Licensing
The project is fully open-sourced under the Apache 2.0 license—not Meta’s previously used restrictive Llama license. This means researchers and developers can freely use it immediately, even for commercial and enterprise-grade projects, without restrictions.
3. Zero-Shot Learning Capability
Through zero-shot in-context learning, users can provide a few audio-text paired examples of a new language during inference, enabling the model to transcribe additional utterances in that language without any retraining. This grants the system unprecedented scalability.
Technical Architecture
Model Families
The project includes multiple model variants:
W2V (Wav2Vec 2.0) Encoder Series
- Parameter scales: 300M, 1B, 3B, 7B
- Used for extracting multilingual speech representations
CTC Decoder Series
- Based on the Connectionist Temporal Classification (CTC) framework
- Parameter scales: 300M, 1B, 3B, 7B
LLM Decoder Series
- Based on the Transformer architecture
- Parameter scales: 300M, 1B, 3B, 7B
- Includes a zero-shot variant (7B_ZS)
Core Technical Innovations
The system achieves rich, large-scale multilingual semantic representations directly from raw, untranscribed speech data for the first time by scaling the wav2vec 2.0 encoder to 7 billion parameters.
Dataset
Omnilingual ASR Corpus
Meta collaborated with researchers and community organizations across Africa, Asia, and other regions to create the Omnilingual ASR Corpus—a 3,350-hour dataset covering 348 low-resource languages.
Collaborating organizations include:
- African Next Voices (supported by the Gates Foundation)
- Mozilla Foundation’s Common Voice project
- Lanfrica / NaijaVoices
Dataset features:
- Released under the CC-BY-4.0 license
- Contains natural, unscripted speech
- Culturally relevant, open-ended prompt design
Installation and Usage
Basic Installation
# Using pip
pip install omnilingual-asr
# Using uv
uv add omnilingual-asr
Note: Audio support requires the libsndfile library (Mac: brew install libsndfile).
Basic Usage Example
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
# Initialize pipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
# Prepare audio files and languages
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
# Perform transcription
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
Checking Supported Languages
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
Language format: {language_code}_{script}, e.g.:
eng_Latn– English (Latin script)cmn_Hans– Mandarin Chinese (Simplified Chinese script)jpn_Jpan– Japanese (Japanese script)
Evaluating with the Dataset
from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
# Load dataset for a specific language
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn",
split="train", streaming=True)
batch = next(omni_dataset.iter(5))
# Convert to pipeline input format
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
for x in batch["audio"]]
# Run inference
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)
Performance Metrics
The largest model, 7B-LLM-ASR, achieves a Character Error Rate (CER) below 10 for nearly 80% of supported languages, including:
- 236 languages trained with over 50 hours of data
- 195 languages achieving good performance with less than 10 hours of training data
Applications and Impact
This system holds significant implications for education, government, and NGOs:
- Education: Enables transcription and translation of lectures or oral traditions in native languages
- Government & NGOs: Provides accessible voice interfaces and documentation tools for marginalized communities
- AI Industry: Demonstrates that globally scalable AI systems can be built on open, community-driven foundations
Current Limitations
⚠️ Important note: Currently, only audio files under 40 seconds are accepted for inference. Support for unlimited-length audio transcription is planned for imminent release.
Project Resources
- GitHub Repository: https://github.com/facebookresearch/omnilingual-asr
- Dataset: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus
- Online Demo: https://huggingface.co/spaces/facebook/omniasr-transcriptions
- Technical Paper: https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/
Citation
If you use Omnilingual ASR in your research, please cite using the following BibTeX format:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
Summary
Omnilingual ASR represents a major breakthrough in speech recognition technology—not only achieving unprecedented language coverage but, more importantly, offering openness and extensibility that bring genuine technological democratization to global linguistic communities. It marks a shift in the ASR field from centralized, closed cloud services toward community-extensible infrastructure, transforming speech recognition into an inclusive rather than restrictive tool.