Contextualize Me – The Case for Context in Reinforcement Learning
Authors: Carolin Benjamins, Theresa Eimer, Frederik Schubert, Aditya Mohan, Sebastian Döhler, André Biedenkapp, Bodo Rosenhahn, Frank Hutter, Marius Lindauer
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our goal is to show how the framework of c RL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in c RL requires context information, as in other related areas of partial observability. To empirically validate this in the c RL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on c RL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging and that naive solutions are not enough to generalize across complex context spaces. We use our benchmark library to empirically show how different context variations can significantly increase the difficulty of training RL agents, even in simple environments. We further verify the intuition that allowing RL agents access to context information is beneficial for generalization tasks in theory and practice. To explore our research questions, we use our benchmark library CARL. Details about the hyperparameter settings and used hardware for all experiments are listed in Appendix C. |
| Researcher Affiliation | Academia | Carolin Benjamins EMAIL Leibniz University Hannover Theresa Eimer EMAIL Leibniz University Hannover Frederik Schubert EMAIL Leibniz University Hannover Aditya Mohan EMAIL Leibniz University Hannover Sebastian Döhler EMAIL Leibniz University Hannover André Biedenkapp EMAIL University of Freiburg Bodo Rosenhahn EMAIL Leibniz University Hannover Frank Hutter EMAIL University of Freiburg Marius Lindauer EMAIL Leibniz University Hannover |
| Pseudocode | No | The paper describes methods and concepts through text and mathematical formulations. It includes figures and tables but no explicit 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | All experiments can be reproduced using the scripts we provide with the benchmark library at https://github.com/automl/CARL. |
| Open Datasets | Yes | In our release of CARL benchmarks, we include and contextually extend classic control and box2d environments from Open AI Gym (Brockman et al., 2016), Google Brax walkers (Freeman et al., 2021), a selection from the Deep Mind Control Suite (Tassa et al., 2018), an RNA folding environment (Runge et al., 2019) as well as Super Mario levels (Awiszus et al., 2020; Schubert et al., 2021), see Figure 4. |
| Dataset Splits | Yes | To generate instances of both environments, we vary the length of the pole across a uniform distribution p C = U(0.25, 0.75) around the standard pole length for Cart Pole and the pole length across p C = U(1, 2.2) for Pendulum. For training, we sample 64 contexts from this distribution and train a general agent which experiences all contexts during training in a round robin fashion. Afterwards, each agent is evaluated on each context it was trained on for 10 episodes. For the train and test context sets, we sample 1000 contexts each for the train and test distributions defined in the evaluation protocol, see Figure 9. The test performances are discretized and aggregated across seeds by the bootstrapped mean using rliable (Agarwal et al., 2021). In Figure 9, we show that both hidden (context-oblivious) and visible (concatenate) agents perform fairly well within their training distribution for evaluation mode A and even generalize to fairly large areas of the test distribution, more so for concat. Large update intervals combined with extreme pole lengths proves to be the most challenging area. We repeat this with 10 random seeds and 5 test episodes per context. |
| Hardware Specification | Yes | Hardware All experiments on all benchmarks were conducted on a slurm CPU and GPU cluster (see Table 2). On the CPU partition there are 1592 CPUs available across nodes. Table 2: GPU NVIDIA Quattro M5000 1, GPU NVIDIA RTX 2080 Ti 56, GPU NVIDIA RTX 2080 Ti 12, GPU NVIDIA RTX 1080 Ti 6, GPU NVIDIA GTX Titan X 4, GPU NVIDIA GT 640 1 |
| Software Dependencies | No | We implemented our own agents using coax (Holsheimer et al., 2023) with hyperparameters specified in Table 1. All experiments can be reproduced using the scripts we provide with the benchmark library at https://anonymous.4open.science/r/CARL-54F4/. The paper mentions the `coax` library and its associated publication year, but does not specify a version number for `coax` itself or for other core software components like Python, PyTorch/TensorFlow, or specific algorithms (C51, SAC, PPO) used. |
| Experiment Setup | Yes | Details about the hyperparameter settings and used hardware for all experiments are listed in Appendix C. Table 1: Hyperparameters for algorithm and environment combinations (algorithm, env, n_step, gamma, alpha, batch_size, learning_rate, q_targ_tau, warmup_num_frames, pi_warmup_num_frames, pi_update_freq, replay_capacity, network {width : 256, num_atoms : 51}, pi_temperature, q_min_value, q_max_value). |