Home
Login

A scalable generative AI framework built for researchers and developers, focusing on large language models, multi-modal AI, and speech AI (automatic speech recognition and text-to-speech)

Apache-2.0Python 14.9kNVIDIA Last Updated: 2025-06-19

NVIDIA NeMo Project Detailed Introduction

Project Overview

The NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers, focusing on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Computer Vision (CV). The framework aims to help users efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints.

Core Features

NeMo 2.0 Major Updates

NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability:

  • Python-based Configuration - Transition from YAML files to Python-based configuration, providing greater flexibility and control.
  • Modular Abstraction - Adoption of PyTorch Lightning's modular abstraction, simplifying adaptation and experimentation.
  • Scalability - Seamless scaling to large-scale experiments with thousands of GPUs using NeMo-Run.

Technical Architecture Advantages

All NeMo models are trained using Lightning, with training automatically scaling to thousands of GPUs. The framework integrates cutting-edge distributed training techniques, including:

  • Tensor Parallelism (TP)
  • Pipeline Parallelism (PP)
  • Fully Sharded Data Parallelism (FSDP)
  • Mixture of Experts (MoE)
  • Mixed Precision Training (supporting BFloat16 and FP8)

Transformer-based LLMs and MMs leverage the NVIDIA Transformer Engine for FP8 training on NVIDIA Hopper GPUs, while utilizing NVIDIA Megatron Core to scale Transformer model training.

Main Application Areas

1. Large Language Models and Multimodal Models

Latest Feature Updates

  • AutoModel Support - The latest feature of NeMo Framework, AutoModel, supports 🤗Hugging Face models, with version 25.02 focusing on AutoModelForCausalLM in the text generation category.
  • Blackwell Support - NeMo Framework has added Blackwell support, with version 25.02 focusing on feature parity for B200.

Model Alignment Techniques

NeMo LLMs can be aligned using state-of-the-art methods such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). In addition to Supervised Fine-Tuning (SFT), NeMo also supports the latest Parameter-Efficient Fine-Tuning (PEFT) techniques such as LoRA, P-Tuning, Adapters, and IA3.

2. Cosmos World Foundation Models

The NVIDIA Cosmos platform accelerates the development of world models for physical AI systems. Built on CUDA, Cosmos combines state-of-the-art world foundation models, video tokenizers, and AI-accelerated data processing pipelines. Developers can accelerate world model development by fine-tuning Cosmos world foundation models or building new models from scratch.

3. Speech Recognition Technology

Parakeet Series Models

  • Parakeet-TDT - Demonstrates better accuracy and is 64% faster than the previous best model, Parakeet-RNNT-1.1B.
  • Canary Multilingual Model - Can transcribe speech in English, Spanish, German, and French, with punctuation and capitalization, and also provides bidirectional translation between these languages.

Performance Optimization

The NVIDIA NeMo team has released several inference optimizations for CTC, RNN-T, and TDT models, achieving up to 10x inference speed improvements. These models now exceed 2,000 Real-Time Factor (RTFx), with some even reaching 6,000 RTFx.

Installation and Deployment

Supported Installation Methods

  1. Conda/Pip Installation - Suitable for exploring NeMo, recommended for ASR and TTS domains.
  2. NGC PyTorch Container - Install from source code into a highly optimized container.
  3. NGC NeMo Container - Ready-to-use solution containing all dependencies.

System Requirements

  • Python 3.10 or higher
  • PyTorch 2.5 or higher
  • NVIDIA GPU (if intending to perform model training)

Platform Support

Operating System/Platform PyPi Installation NGC Container Source Installation
Linux - amd64/x84_64 Limited Support Fully Supported
Linux - arm64 Limited Support Limited Support
macOS - amd64/x64_64 Deprecated Deprecated
macOS - arm64 Limited Support Limited Support

Ecosystem and Toolchain

Related Projects

  • NeMo-Run - A tool for configuring, launching, and managing machine learning experiments.
  • NeMo Curator - A scalable data preprocessing and curation toolkit for LLMs.
  • NeMo Guardrails - An open-source toolkit for adding programmable guardrails to LLM-based conversational systems.
  • NeMo Aligner - Model alignment tool.
  • NeMo Skills - A project to improve the "skills" of large language models.

Deployment and Optimization

  • NeMo LLMs and MMs can be deployed and optimized through NVIDIA NeMo Microservices.
  • NeMo ASR and TTS models can be optimized for inference and deployed to production use cases through NVIDIA Riva.

Performance Benchmarks

Benchmark Results

  • MLPerf Training v4.0 - Using the NVIDIA NeMo Framework and NVIDIA Hopper GPUs, NVIDIA was able to scale to 11,616 H100 GPUs and achieve near-linear performance scaling on LLM pre-training.
  • H200 Performance Improvement - Up to 4.2x speedup in Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.

Application Cases and Partners

Enterprise Applications

  • Amazon Titan Foundation Models - NVIDIA NeMo Framework now provides efficient large language model training support for Amazon Titan foundation models.
  • Bria.ai Platform - Leveraging NeMo's multimodal ensemble reference implementation to achieve high-throughput and low-latency image generation.

Cloud Platform Support

  • Amazon EKS - Supports running distributed training workloads on Amazon Elastic Kubernetes Service clusters.
  • Google GKE - Provides end-to-end guidance for training generative AI models on Google Kubernetes Engine.

Open Source and Licensing

The NeMo Framework is open-source under the Apache 2.0 license, welcoming community contributions. The project maintains active development and support on GitHub, providing extensive documentation, tutorials, and example scripts.

Learning Resources

  • Official Documentation - Provides complete user guides and technical documentation.
  • Tutorials - Extensive tutorials that can be run on Google Colab.
  • Example Scripts - Complete suite of examples supporting multi-GPU/multi-node training.
  • Community Support - FAQ and community support provided through the GitHub Discussions board.

The NVIDIA NeMo Framework represents the cutting edge of generative AI development, providing researchers and developers with a powerful, flexible, and scalable platform for building the next generation of AI applications.