reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stable-Baselines3: Reliable Reinforcement Learning Implementations

Authors: Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Each algorithm has been benchmarked on common environments (Raﬃn and Stulp, 2020) and compared to prior implementations. Our test suite covers 95% of the code and, together with our active user base1 scrutinizing changes, ensures that any implementation errors are minimized. Algorithms are veriﬁed against published results by comparing the agent learning curves6.
Researcher Affiliation	Collaboration	1 Robotics and Mechatronics Center (RMC), German Aerospace Center (DLR), Weßling, Germany 2 Interactive Robotics Laboratory, University Paris-Saclay, CEA, Palaiseau, France 3 Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA 4 School of Computing, University of Eastern Finland, Joensuu, Finland 5 Kiteswarms Gmb H, Freiburg, Germany
Pseudocode	Yes	import gym from stable_baselines3 import SAC # Train an agent using Soft Actor-Critic on Pendulum-v0 env = gym.make("Pendulum-v0") model = SAC("Mlp Policy", env).learn(total_timesteps=20000) # Save the model model.save("sac_pendulum") # Load the trained model model = SAC.load("sac_pendulum") # Start a new episode obs = env.reset() # What action to take in state obs ? action, _ = model.predict(obs, deterministic=True) Figure 1: Using Stable-Baselines3 to train, save, load, and infer an action from a policy.
Open Source Code	Yes	Our documentation, examples, and source-code are available at https://github.com/DLR-RM/stable-baselines3.
Open Datasets	Yes	Each algorithm has been benchmarked on common environments (Raﬃn and Stulp, 2020) and compared to prior implementations. We support logging to CSV ﬁles and Tensor Board. Users can log custom metrics and modify training via user-provided callbacks. To speed up training, we support parallel (or vectorized ) environments. To simplify training, we implement common environment wrappers, like preprocessing Atari observations to match the original DQN experiments (Mnih et al., 2015).
Dataset Splits	No	The paper mentions evaluating agents on 'common environments' and 'evaluating in a separate environment', which implies data partitioning for training and testing. It also mentions 'preprocessing Atari observations to match the original DQN experiments (Mnih et al., 2015)', suggesting the use of standard splits for those environments. However, it does not explicitly detail the percentages, sample counts, or specific methodology for creating dataset splits used in its own reported experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'PyTorch' and 'gym', and tools like 'Optuna', but it does not provide specific version numbers for these software dependencies. For instance, it mentions 'Py Torch (Paszke et al., 2019)', citing a paper, but not a version number like 'PyTorch 1.9'.
Experiment Setup	No	The paper states that 'RL Baselines Zoo (Raﬃn, 2018, 2020b) provides scripts to train and evaluate agents, tune hyperparameters, record videos, store experiment setup and visualize results.' and 'We also include a collection of pre-trained reinforcement learning agents together with tuned hyperparameters'. While it indicates that hyperparameters exist and are available through RL Baselines Zoo, it does not explicitly list any specific hyperparameter values or training configurations within the main text of the paper itself.