Stable-Baselines3: Reliable Reinforcement Learning Implementations
Authors: Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Each algorithm has been benchmarked on common environments (Raffin and Stulp, 2020) and compared to prior implementations. Our test suite covers 95% of the code and, together with our active user base1 scrutinizing changes, ensures that any implementation errors are minimized. Algorithms are verified against published results by comparing the agent learning curves6. |
| Researcher Affiliation | Collaboration | 1 Robotics and Mechatronics Center (RMC), German Aerospace Center (DLR), Weßling, Germany 2 Interactive Robotics Laboratory, University Paris-Saclay, CEA, Palaiseau, France 3 Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA 4 School of Computing, University of Eastern Finland, Joensuu, Finland 5 Kiteswarms Gmb H, Freiburg, Germany |
| Pseudocode | Yes | import gym from stable_baselines3 import SAC # Train an agent using Soft Actor-Critic on Pendulum-v0 env = gym.make("Pendulum-v0") model = SAC("Mlp Policy", env).learn(total_timesteps=20000) # Save the model model.save("sac_pendulum") # Load the trained model model = SAC.load("sac_pendulum") # Start a new episode obs = env.reset() # What action to take in state obs ? action, _ = model.predict(obs, deterministic=True) Figure 1: Using Stable-Baselines3 to train, save, load, and infer an action from a policy. |
| Open Source Code | Yes | Our documentation, examples, and source-code are available at https://github.com/DLR-RM/stable-baselines3. |
| Open Datasets | Yes | Each algorithm has been benchmarked on common environments (Raffin and Stulp, 2020) and compared to prior implementations. We support logging to CSV files and Tensor Board. Users can log custom metrics and modify training via user-provided callbacks. To speed up training, we support parallel (or vectorized ) environments. To simplify training, we implement common environment wrappers, like preprocessing Atari observations to match the original DQN experiments (Mnih et al., 2015). |
| Dataset Splits | No | The paper mentions evaluating agents on 'common environments' and 'evaluating in a separate environment', which implies data partitioning for training and testing. It also mentions 'preprocessing Atari observations to match the original DQN experiments (Mnih et al., 2015)', suggesting the use of standard splits for those environments. However, it does not explicitly detail the percentages, sample counts, or specific methodology for creating dataset splits used in its own reported experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'PyTorch' and 'gym', and tools like 'Optuna', but it does not provide specific version numbers for these software dependencies. For instance, it mentions 'Py Torch (Paszke et al., 2019)', citing a paper, but not a version number like 'PyTorch 1.9'. |
| Experiment Setup | No | The paper states that 'RL Baselines Zoo (Raffin, 2018, 2020b) provides scripts to train and evaluate agents, tune hyperparameters, record videos, store experiment setup and visualize results.' and 'We also include a collection of pre-trained reinforcement learning agents together with tuned hyperparameters'. While it indicates that hyperparameters exist and are available through RL Baselines Zoo, it does not explicitly list any specific hyperparameter values or training configurations within the main text of the paper itself. |