facebookresearch/omnilingual-asr View GitHub Homepage for Latest Official Releases

Meta's open-source multilingual speech recognition system supporting over 1,600 languages

NOASSERTIONPythonomnilingual-asrfacebookresearch 1.6k Last Updated: November 13, 2025

Omnilingual ASR – Meta’s Open-Source Multilingual Speech Recognition System

Project Overview

Omnilingual ASR is a revolutionary open-source speech recognition system developed by Meta’s Fundamental AI Research (FAIR) team. The system supports speech recognition in over 1,600 languages, including hundreds previously unsupported by any ASR technology. What makes this project unique is that beyond its officially trained 1,600+ languages, it can extend via zero-shot in-context learning to support more than 5,400 languages—covering nearly all known spoken languages with written scripts.

Core Features

1. Unprecedented Language Coverage

1,600+ officially supported languages: Fully trained language support
5,400+ potentially supported languages: Extendable via zero-shot learning
Low-resource language support: 78% of supported languages achieve Character Error Rate (CER) below 10%
Includes Japanese support: Language code jpn_Jpan

2. Open-Source Licensing

The project is fully open-sourced under the Apache 2.0 license—not Meta’s previously used restrictive Llama license. This means researchers and developers can freely use it immediately, even for commercial and enterprise-grade projects, without restrictions.

3. Zero-Shot Learning Capability

Through zero-shot in-context learning, users can provide a few audio-text paired examples of a new language during inference, enabling the model to transcribe additional utterances in that language without any retraining. This grants the system unprecedented scalability.

Technical Architecture

Model Families

The project includes multiple model variants:

W2V (Wav2Vec 2.0) Encoder Series
- Parameter scales: 300M, 1B, 3B, 7B
- Used for extracting multilingual speech representations
CTC Decoder Series
- Based on the Connectionist Temporal Classification (CTC) framework
- Parameter scales: 300M, 1B, 3B, 7B
LLM Decoder Series
- Based on the Transformer architecture
- Parameter scales: 300M, 1B, 3B, 7B
- Includes a zero-shot variant (7B_ZS)

Core Technical Innovations

The system achieves rich, large-scale multilingual semantic representations directly from raw, untranscribed speech data for the first time by scaling the wav2vec 2.0 encoder to 7 billion parameters.

Dataset

Omnilingual ASR Corpus

Meta collaborated with researchers and community organizations across Africa, Asia, and other regions to create the Omnilingual ASR Corpus—a 3,350-hour dataset covering 348 low-resource languages.

Collaborating organizations include:

African Next Voices (supported by the Gates Foundation)
Mozilla Foundation’s Common Voice project
Lanfrica / NaijaVoices

Dataset features:

Released under the CC-BY-4.0 license
Contains natural, unscripted speech
Culturally relevant, open-ended prompt design

Installation and Usage

Basic Installation

# Using pip
pip install omnilingual-asr

# Using uv
uv add omnilingual-asr

Note: Audio support requires the libsndfile library (Mac: brew install libsndfile).

Basic Usage Example

from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# Initialize pipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")

# Prepare audio files and languages
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]

# Perform transcription
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)

Checking Supported Languages

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)

# Check if a specific language is supported
if "eng_Latn" in supported_langs:
    print("English (Latin script) is supported!")

Language format: {language_code}_{script}, e.g.:

eng_Latn – English (Latin script)
cmn_Hans – Mandarin Chinese (Simplified Chinese script)
jpn_Jpan – Japanese (Japanese script)

Evaluating with the Dataset

from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# Load dataset for a specific language
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn", 
                             split="train", streaming=True)
batch = next(omni_dataset.iter(5))

# Convert to pipeline input format
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
              for x in batch["audio"]]

# Run inference
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)

Performance Metrics

The largest model, 7B-LLM-ASR, achieves a Character Error Rate (CER) below 10 for nearly 80% of supported languages, including:

236 languages trained with over 50 hours of data
195 languages achieving good performance with less than 10 hours of training data

Applications and Impact

This system holds significant implications for education, government, and NGOs:

Education: Enables transcription and translation of lectures or oral traditions in native languages
Government & NGOs: Provides accessible voice interfaces and documentation tools for marginalized communities
AI Industry: Demonstrates that globally scalable AI systems can be built on open, community-driven foundations

Current Limitations

⚠️ Important note: Currently, only audio files under 40 seconds are accepted for inference. Support for unlimited-length audio transcription is planned for imminent release.

Project Resources

GitHub Repository: https://github.com/facebookresearch/omnilingual-asr
Dataset: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus
Online Demo: https://huggingface.co/spaces/facebook/omniasr-transcriptions
Technical Paper: https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

Citation

If you use Omnilingual ASR in your research, please cite using the following BibTeX format:

@misc{omnilingualasr2025,
  title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
  author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
  year={2025},
  url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}

Summary

Omnilingual ASR represents a major breakthrough in speech recognition technology—not only achieving unprecedented language coverage but, more importantly, offering openness and extensibility that bring genuine technological democratization to global linguistic communities. It marks a shift in the ASR field from centralized, closed cloud services toward community-extensible infrastructure, transforming speech recognition into an inclusive rather than restrictive tool.