MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning
Authors: Raghu Rajan, Jessica Lizeth Borja Diaz, Suresh Guttikonda, Fabio Ferreira, André Biedenkapp, Jan Ole von Hartz, Frank Hutter
JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present MDP Playground, a testbed for Reinforcement Learning (RL) agents with dimensions of hardness that can be controlled independently to challenge agents in different ways and obtain varying degrees of hardness in toy and complex RL environments. ... We then show how to design experiments using MDP Playground to gain insights on the toy environments. We also provide wrappers that can inject many of these dimensions into any Gym environment. We experiment with these wrappers on Atari and Mujoco to allow for understanding the effects of these dimensions on environments that are more complex than the toy environments. ... We evaluated Rllib implementations ... on discrete environments and DDPG ... on continuous environments over grids of values for the dimensions of hardness. ... We evaluated 100 seeds for each of these experiments. |
| Researcher Affiliation | Academia | Raghu Rajan EMAIL University of Freiburg Jessica Lizeth Borja Diaz EMAIL University of Freiburg Suresh Guttikonda EMAIL University of Freiburg Fabio Ferreira EMAIL University of Freiburg André Biedenkapp EMAIL University of Freiburg Jan Ole von Hartz EMAIL University of Freiburg Frank Hutter EMAIL University of Freiburg |
| Pseudocode | Yes | Appendix B.3 Algorithms for Automatically Generated Toy MDPs with MDP Playground Algorithm 1 Automatically Generated Discrete Toy MDPs with MDP Playground Algorithm 2 Automatically Generated Continuous Toy MDPs with MDP Playground |
| Open Source Code | Yes | The Git Hub repository7 describes how to run experiments and how to analyse them with plots in a Jupyter Notebook. Section 6 describes some experiments and analyses of this kind. 7. https://github.com/automl/mdp-playground ... Finally, we would like MDP Playground to be a community-driven effort. It is open-source for the benefit of the RL community at https://github.com/automl/mdp-playground. |
| Open Datasets | Yes | We also provide wrappers that can inject many of these dimensions into any Gym environment. We experiment with these wrappers on Atari and Mujoco to allow for understanding the effects of these dimensions on environments that are more complex than the toy environments. ... For the complex experiments, we used Atari and Mujoco. For Atari, we ran the agents on Beam Rider, Breakout, Qbert and Space Invaders. For Mujoco, we ran the agents on Half Cheetah V3, Pusher and Reacher using mujoco-py. |
| Dataset Splits | No | The paper describes generating toy environments with specific parameters (e.g., "For the discrete toy experiments, we created a simple toy task with S and A set to 8.") and using existing complex environments like Atari and Mujoco, for which it specifies total training steps and seeds (e.g., "We evaluated 5 seeds for 10M steps (40M frames) for Atari, 500k steps for Pusher and Reacher and 3M for Half Cheetah."). However, it does not provide explicit training/test/validation dataset splits for any of these environments. |
| Hardware Specification | Yes | DQN delay experiments from Section 6 performed using Ray RLLib (Liang et al., 2018) with a network with 2 hidden layers of 256 units each) took on average 35s for a complete run of DQN for 20 000 environment steps. In this setting, we restricted Ray RLLib and the underlying Tensorflow to run on one core of a laptop (core-i7-8850H CPU the full CPU specifications can be found in Appendix H). ... Appendix H. CPU Specifications: Cluster experiments were run on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz cores for approximately 300000 CPU hours. ... The laptop CPU core specifications were: ... model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz |
| Software Dependencies | Yes | Older experiments on the discrete toy environments were run with Ray 0.7.3, while for the newer continuous and complex environments, they were run with Ray 0.9.0. |
| Experiment Setup | Yes | We used the following experimental setup for this section. Agents: We evaluated Rllib implementations (Liang et al., 2018) of DQN (Mnih et al., 2015), Rainbow DQN (Hessel et al., 2018), and A3C (Mnih et al., 2016) on discrete environments and DDPG (Lillicrap et al., 2016), TD3 (Fujimoto et al., 2018) and SAC (Haarnoja et al., 2018) on continuous environments over grids of values for the dimensions of hardness. The information state used by the agent was always the observation given out by the environment. Hyperparameters (including neural network architectures used) and the tuning procedure used are specified in Appendices F and G. We used fully connected networks except for image representations where we used Convolutional Neural Networks (CNNs). ... We evaluated 100 seeds for each of these experiments. We ran the DQN variants for 20k timesteps and A3C for 150k timesteps. |