RLlib: Industry-Grade, Scalable Reinforcement Learning
RLlib is an open-source library for reinforcement learning (RL) that offers support for production-level, highly scalable, and fault-tolerant RL workloads. It maintains simple and unified APIs for a large variety of industry applications. Whether you're training policies in a multi-agent setup, from historical offline data, or using externally connected simulators, RLlib provides straightforward solutions for autonomous decision-making needs, enabling you to start running experiments quickly.
Getting Started with RLlib
It's easy to get started with RLlib. Follow these steps:
- Install RLlib and PyTorch:
pip install "ray[rllib]" torch
For Apple Silicon (M1) computers, follow the instructions [here](link-to-instructions). To run Atari or MuJoCo examples, you'll also need:
```bash
pip install "gymnasium[atari,accept-rom-license,mujoco]"
- Run a Simple Example: Here's how to run the PPO algorithm on the Taxi domain:
from ray.rllib.algorithms.ppo import PPOConfig from ray.rllib.connectors.env_to_module import FlattenObservations
Configure the algorithm
config = ( PPOConfig() .environment("Taxi-v3") .env_runners(num_env_runners=2, env_to_module_connector=lambda env: FlattenObservations()) .evaluation(evaluation_num_env_runners=1) )
Build and train the algorithm
algo = config.build() for _ in range(5): print(algo.train())
Evaluate the algorithm
algo.evaluate()
## Key Features and Capabilities
* **Scalability and Fault Tolerance:** RLlib scales across multiple axes: the number of EnvRunner actors for data collection (configurable via `config.env_runners(num_env_runners=...)`), and the number of learner actors for multi-GPU training (configurable via `config.learners(num_learners=...)`). It's fully fault-tolerant, allowing training with unstable environments.
* **Multi-Agent RL:** RLlib natively supports multi-agent reinforcement learning (MARL), enabling various configurations: independent learning, collaborative training, adversarial training, and combinations thereof.
* **Offline RL:** Integration with Ray.Data allows large-scale data ingestion for offline RL and behavior cloning (BC).
* **Customization:** RLlib offers APIs for customizing environments, models, optimizers, loss functions, and exploratory behavior. It's built on Ray, providing distributed and fault-tolerant algorithms, PyTorch default models, multi-GPU training, and multi-agent support.
## Supported Algorithms and Environments
RLlib supports a wide range of algorithms (PPO, SAC, DQN, IMPALA, APPO, DreamerV3, BC, CQL, MARWIL) and environments (Gymnasium, PettingZoo, and custom formats).
## Learn More
* **Environments:** [Link to Environments Documentation]
* **Key Concepts:** [Link to Key Concepts Documentation]
* **Algorithms:** [Link to Algorithms Documentation]
* **Customizing RLlib:** [Link to Customization Documentation]