Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Authors: Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this survey, we seek to unify the field of Auto RL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.
Researcher Affiliation Collaboration Jack Parker-Holder EMAIL University of Oxford Raghu Rajan EMAIL University of Freiburg Xingyou Song EMAIL Google Research, Brain Team André Biedenkapp EMAIL University of Freiburg Yingjie Miao EMAIL Google Research, Brain Team Theresa Eimer EMAIL Leibniz University Hannover Baohe Zhang EMAIL University of Freiburg Vu Nguyen EMAIL Amazon Australia Roberto Calandra EMAIL Meta AI Aleksandra Faust EMAIL Google Research, Brain Team Frank Hutter EMAIL University of Freiburg & Bosch Center for Artificial Intelligence Marius Lindauer EMAIL Leibniz University Hannover
Pseudocode No The paper describes various Auto RL methods and algorithms conceptually, and uses diagrams like Figure 1 and Figure 2 to illustrate components and loops, but it does not contain any structured pseudocode or algorithm blocks with step-by-step instructions.
Open Source Code No The paper is a survey and does not present original methodology requiring code release. It mentions third-party libraries like Jax, TensorFlow, and PyTorch in the context of autodifferentation, but does not claim to release its own code for the survey itself.
Open Datasets Yes The paper mentions and cites several well-known public benchmarks and environments used in Reinforcement Learning research, including "Open AI Gym (Brockman et al., 2016)", "Arcade Learning Environment (Bellemare et al., 2012)", "Open AI Procgen (Cobbe et al., 2020)", "Coin Run (Cobbe et al., 2019a)", "Mini Grid (Chevalier-Boisvert et al., 2018)", "Net Hack (Küttler et al., 2020)", "Mine RL (Guss et al., 2019)", and "Meta-World (Yu et al., 2019)".
Dataset Splits No The paper is a survey discussing concepts related to training and validation rewards and references how other works use dataset distributions. For instance, it states: "f(ζ, θ ) can be defined as the validation reward, i.e. the reward in the outer loop, whereas J(θ ; ζ) can be considered the training reward, i.e. the reward in the inner loop." However, it does not specify any particular dataset splits for its own experimental results because it is a survey and does not conduct original experiments.
Hardware Specification No The paper is a survey and does not describe experiments conducted by the authors. While it mentions resource requirements for certain methods (e.g., "thousands of CPU cores" for evolutionary approaches or "massively parallel simulation with a single GPU" in the context of the Brax physics engine for future work), it does not specify any particular hardware used by the authors for their own work.
Software Dependencies No The paper mentions popular machine learning frameworks such as "Jax (Bradbury et al., 2018), Tensorflow (Abadi et al., 2015), Pytorch (Paszke et al., 2019)" in Section 4.6. However, these are mentioned as examples of readily available autodifferentiation libraries in the context of gradient-based meta-learning, not as specific versioned software dependencies for the paper's own methodology.
Experiment Setup No The paper is a survey that discusses the importance of various hyperparameters (e.g., discount factor γ, batch size B) and methods for their optimization in Auto RL. For example, Section 3.4 is titled "Last but not Least: What about Hyperparameters?". However, as a survey, it does not present specific experimental results or provide concrete hyperparameter values or training configurations used in its own empirical work.