reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning

Authors: Raghu Rajan, Jessica Lizeth Borja Diaz, Suresh Guttikonda, Fabio Ferreira, André Biedenkapp, Jan Ole von Hartz, Frank Hutter

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present MDP Playground, a testbed for Reinforcement Learning (RL) agents with dimensions of hardness that can be controlled independently to challenge agents in diﬀerent ways and obtain varying degrees of hardness in toy and complex RL environments. ... We then show how to design experiments using MDP Playground to gain insights on the toy environments. We also provide wrappers that can inject many of these dimensions into any Gym environment. We experiment with these wrappers on Atari and Mujoco to allow for understanding the eﬀects of these dimensions on environments that are more complex than the toy environments. ... We evaluated Rllib implementations ... on discrete environments and DDPG ... on continuous environments over grids of values for the dimensions of hardness. ... We evaluated 100 seeds for each of these experiments.
Researcher Affiliation	Academia	Raghu Rajan EMAIL University of Freiburg Jessica Lizeth Borja Diaz EMAIL University of Freiburg Suresh Guttikonda EMAIL University of Freiburg Fabio Ferreira EMAIL University of Freiburg André Biedenkapp EMAIL University of Freiburg Jan Ole von Hartz EMAIL University of Freiburg Frank Hutter EMAIL University of Freiburg
Pseudocode	Yes	Appendix B.3 Algorithms for Automatically Generated Toy MDPs with MDP Playground Algorithm 1 Automatically Generated Discrete Toy MDPs with MDP Playground Algorithm 2 Automatically Generated Continuous Toy MDPs with MDP Playground
Open Source Code	Yes	The Git Hub repository7 describes how to run experiments and how to analyse them with plots in a Jupyter Notebook. Section 6 describes some experiments and analyses of this kind. 7. https://github.com/automl/mdp-playground ... Finally, we would like MDP Playground to be a community-driven eﬀort. It is open-source for the beneﬁt of the RL community at https://github.com/automl/mdp-playground.
Open Datasets	Yes	We also provide wrappers that can inject many of these dimensions into any Gym environment. We experiment with these wrappers on Atari and Mujoco to allow for understanding the eﬀects of these dimensions on environments that are more complex than the toy environments. ... For the complex experiments, we used Atari and Mujoco. For Atari, we ran the agents on Beam Rider, Breakout, Qbert and Space Invaders. For Mujoco, we ran the agents on Half Cheetah V3, Pusher and Reacher using mujoco-py.
Dataset Splits	No	The paper describes generating toy environments with specific parameters (e.g., "For the discrete toy experiments, we created a simple toy task with S and A set to 8.") and using existing complex environments like Atari and Mujoco, for which it specifies total training steps and seeds (e.g., "We evaluated 5 seeds for 10M steps (40M frames) for Atari, 500k steps for Pusher and Reacher and 3M for Half Cheetah."). However, it does not provide explicit training/test/validation dataset splits for any of these environments.
Hardware Specification	Yes	DQN delay experiments from Section 6 performed using Ray RLLib (Liang et al., 2018) with a network with 2 hidden layers of 256 units each) took on average 35s for a complete run of DQN for 20 000 environment steps. In this setting, we restricted Ray RLLib and the underlying Tensorﬂow to run on one core of a laptop (core-i7-8850H CPU the full CPU speciﬁcations can be found in Appendix H). ... Appendix H. CPU Speciﬁcations: Cluster experiments were run on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz cores for approximately 300000 CPU hours. ... The laptop CPU core speciﬁcations were: ... model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
Software Dependencies	Yes	Older experiments on the discrete toy environments were run with Ray 0.7.3, while for the newer continuous and complex environments, they were run with Ray 0.9.0.
Experiment Setup	Yes	We used the following experimental setup for this section. Agents: We evaluated Rllib implementations (Liang et al., 2018) of DQN (Mnih et al., 2015), Rainbow DQN (Hessel et al., 2018), and A3C (Mnih et al., 2016) on discrete environments and DDPG (Lillicrap et al., 2016), TD3 (Fujimoto et al., 2018) and SAC (Haarnoja et al., 2018) on continuous environments over grids of values for the dimensions of hardness. The information state used by the agent was always the observation given out by the environment. Hyperparameters (including neural network architectures used) and the tuning procedure used are speciﬁed in Appendices F and G. We used fully connected networks except for image representations where we used Convolutional Neural Networks (CNNs). ... We evaluated 100 seeds for each of these experiments. We ran the DQN variants for 20k timesteps and A3C for 150k timesteps.