DLR-RM/stable-baselines3Please refer to the latest official releases for information GitHub Homepage

Stable Baselines3 (SB3) is a library of reinforcement learning algorithms based on PyTorch.

MITPython 10.9kDLR-RM Last Updated: 2025-06-11

Stable Baselines3 (SB3)

Project Overview

Stable Baselines3 (SB3) is a reinforcement learning algorithms library based on PyTorch. It is the successor to Stable Baselines and aims to provide an easy-to-use, modular, and high-performance toolkit for training various reinforcement learning agents. SB3 focuses on providing clear documentation, testable code, and reproducible results, making it an ideal choice for researchers and engineers.

Background

Traditional implementations of reinforcement learning algorithms are often complex and difficult to debug. Stable Baselines aimed to simplify this process by providing a reliable set of baseline algorithms for users to experiment and compare. However, Stable Baselines was based on TensorFlow 1.x, and with the increasing popularity of TensorFlow 2.x and the growing prevalence of PyTorch in the research community, a new library was needed to meet the community's needs.

Stable Baselines3 was created to address this need. It is based on PyTorch and incorporates the lessons learned from Stable Baselines, providing a more concise, modular design, and better performance.

Core Features

Based on PyTorch: SB3 is built entirely on PyTorch, leveraging PyTorch's dynamic graph capabilities and ease of use.
Modular Design: SB3's code structure is clear, and algorithm components can be easily combined and customized, making it convenient for users to conduct research and development.
Easy to Use: SB3 provides a concise API and comprehensive documentation, allowing users to quickly get started and train their own agents.
High Performance: SB3 is optimized to achieve high performance in various environments, including Atari games, MuJoCo continuous control tasks, and more.
Rich Set of Algorithms: SB3 implements a variety of classic reinforcement learning algorithms, including:
- Policy Gradient Methods: A2C, PPO
- Value Function Methods: DQN, Double DQN, Dueling DQN, QRDQN, C51
- Actor-Critic Methods: SAC, TD3
Support for Multiple Environments: SB3 can seamlessly integrate with various environments such as OpenAI Gym, PyBullet, Robotics environments, and more.
Callbacks: SB3 provides a callback mechanism that allows users to execute custom operations during training, such as logging, saving models, and evaluating performance.
Vectorized Environments: SB3 supports vectorized environments, which can run multiple environment instances in parallel, thereby accelerating the training process.
Hyperparameter Optimization: SB3 can be integrated with hyperparameter optimization tools such as Optuna to automatically search for the best hyperparameter combinations.
Comprehensive Documentation and Examples: SB3 provides detailed documentation and a large number of example code to help users understand and use the library.
Type Hints and Testing: SB3 has good type hints and comprehensive test coverage, ensuring the quality and reliability of the code.

Application Scenarios

Stable Baselines3 can be applied to various reinforcement learning tasks, including:

Robot Control: Training robots to complete various tasks, such as navigation, grasping, and assembly.
Game AI: Developing intelligent agents in games, such as Atari games, StarCraft, and more.
Autonomous Driving: Training autonomous vehicles to drive safely in complex traffic environments.
Resource Management: Optimizing resource allocation, such as power distribution, network traffic control, and more.
Financial Trading: Developing trading strategies to achieve automated trading.
Recommendation Systems: Optimizing recommendation algorithms to improve user satisfaction.
Custom Environments: Can be applied to any custom environment that can be solved using reinforcement learning.

Summary

Stable Baselines3 is a powerful and flexible reinforcement learning library based on PyTorch. It provides a rich set of algorithms and tools that can help users quickly build and train reinforcement learning agents. Both researchers and engineers can benefit from SB3 and apply it to various practical problems.