facebookresearch/omnilingual-asr View GitHub Homepage for Latest Official Releases

Meta开源的支持1600+种语言的多语言语音识别系统

NOASSERTIONPythonomnilingual-asrfacebookresearch 2.7k Last Updated: December 30, 2025

Omnilingual ASR - Meta开源多语言语音识别系统

项目概述

Omnilingual ASR是由Meta的基础人工智能研究（FAIR）团队开发的革命性开源语音识别系统。该系统支持超过1600种语言的语音识别，包括数百种之前从未被任何ASR技术覆盖的语言。这个项目的特别之处在于，它不仅支持已训练的1600种语言，还能通过零样本上下文学习扩展到超过5400种语言——几乎涵盖了所有已知文字系统的口语。

核心特性

1. 前所未有的语言覆盖

1600+种官方支持语言：经过完整训练的语言支持
5400+种潜在支持语言：通过零样本学习可扩展支持
低资源语言支持：78%的支持语言字符错误率（CER）低于10%
包含日语支持：语言代码为 jpn_Jpan

2. 开源许可

该项目采用Apache 2.0许可证完全开源，而不是Meta之前使用的限制性Llama许可证。这意味着研究人员和开发者可以立即免费使用，甚至用于商业和企业级项目，没有任何限制。

3. 零样本学习能力

通过零样本上下文学习功能，用户可以在推理时提供新语言的几个音频-文本配对示例，使模型能够转录该语言的额外语句，而无需任何重新训练。这使得系统具有前所未有的可扩展性。

技术架构

模型家族

项目包含多个模型变体：

W2V（Wav2Vec 2.0）编码器系列
- 参数规模：300M、1B、3B、7B
- 用于提取多语言语音表示
CTC解码器系列
- 基于连接主义时序分类（CTC）框架
- 参数规模：300M、1B、3B、7B
LLM解码器系列
- 基于Transformer架构
- 参数规模：300M、1B、3B、7B
- 包含零样本变体（7B_ZS）

核心技术创新

系统通过将wav2vec 2.0编码器扩展到70亿参数，首次实现了从原始未转录语音数据中生成丰富的大规模多语言语义表示。

数据集

Omnilingual ASR Corpus

Meta与非洲、亚洲等地的研究人员和社区组织合作创建了Omnilingual ASR Corpus，这是一个涵盖348种低资源语言的3350小时数据集。

合作组织包括：

African Next Voices（盖茨基金会支持）
Mozilla基金会的Common Voice项目
Lanfrica / NaijaVoices

数据集特点：

采用CC-BY-4.0许可证开放
包含自然、无脚本的语音
文化相关的开放式提示设计

安装与使用

基本安装

# 使用pip
pip install omnilingual-asr

# 使用uv
uv add omnilingual-asr

注意：音频支持需要libsndfile库（Mac: brew install libsndfile）

基础使用示例

from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# 初始化管道
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")

# 准备音频文件和语言
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]

# 执行转录
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)

查看支持的语言

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

# 打印所有支持的语言
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)

# 检查特定语言是否支持
if "eng_Latn" in supported_langs:
    print("English (Latin script) is supported!")

语言格式：{语言代码}_{文字系统}，例如：

eng_Latn - 英语（拉丁文字）
cmn_Hans - 普通话（简体中文）
jpn_Jpan - 日语（日文文字）

使用数据集进行评估

from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# 加载特定语言的数据集
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn", 
                             split="train", streaming=True)
batch = next(omni_dataset.iter(5))

# 转换为管道输入格式
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
              for x in batch["audio"]]

# 运行推理
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)

性能指标

最大的模型7B-LLM-ASR在近80%的支持语言上实现了低于10的字符错误率。其中：

236种语言需要超过50小时的训练数据
195种语言仅用不到10小时的训练数据就达到了良好效果

应用前景

该系统对教育、政府和非政府组织具有重要意义：

教育领域：支持母语口述传统或讲座的转录和翻译
政府和NGO：为边缘化群体提供可访问的语音界面和文档工具
AI产业：证明全球规模的AI系统可以建立在开放、社区驱动的基础上

当前限制

⚠️ 重要提示：目前仅接受40秒以内的音频文件进行推理。团队计划很快添加对无限长度音频文件转录的支持。

项目资源

GitHub仓库: https://github.com/facebookresearch/omnilingual-asr
数据集: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus
在线Demo: https://huggingface.co/spaces/facebook/omniasr-transcriptions
技术论文: https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

引用格式

如果在研究中使用Omnilingual ASR，请使用以下BibTeX格式引用：

@misc{omnilingualasr2025,
  title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
  author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
  year={2025},
  url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}

总结

Omnilingual ASR代表了语音识别技术的重大突破，不仅在技术层面实现了前所未有的语言覆盖，更重要的是其开放性和可扩展性为全球语言社区带来了真正的技术民主化。这标志着ASR领域从集中式、云端封闭的服务向社区可扩展基础设施的转变，使得语音识别技术真正成为一个包容性而非限制性的工具。