Self-supervised Color Generalization in Reinforcement Learning
Authors: Matthias Weissenbacher, Evangelos Routis, Yoshinobu Kawahara
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites and find improved color sensitivity and generalisation. |
| Researcher Affiliation | Collaboration | Matthias Weissenbacher EMAIL Riken Center for Advanced Intelligence Project Pyr-SAI Labs Japan Evangelos Routis EMAIL Causaly London United Kingdom Yoshinobu Kawahara Riken Center for Advanced Intelligence Project Osaka University Japan |
| Pseudocode | No | The paper describes algorithms like rDMD and CiL mathematically and in narrative text, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | In section 4.2 we perform our main experiments on the Procgen environment. The code is made public at Git Hub. |
| Open Datasets | Yes | We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites... The Lava Crossing environment, a standard in the Mini Grid toolkit (Chevalier-Boisvert et al., 2019)... The Procgen benchmark consists of sixteen procedurally generated games... Procgen generalization benchmark (Cobbe et al., 2020)... Deepmind Control suite (DMControl) (Tassa et al., 2018). |
| Dataset Splits | Yes | Following the setup from (Cobbe et al., 2020), agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated by sampling seeds uniformly at random from all computer integers). |
| Hardware Specification | Yes | All experiments were performed on NVIDIA GPU A-100 or V-100. |
| Software Dependencies | No | The paper mentions 'torch.svd' and algorithms 'PPO/Dr AC' and 'SAC' but does not provide specific version numbers for these or other software libraries. |
| Experiment Setup | Yes | We summarize the hyperparameter choices in Table (5). Table 5: Architecture and hyper-parameter choices for Ci L on Procgen, DMControl, Minigrid based on (Raileanu et al., 2020), (Hansen & Wang, 2021), and (Jiang et al., 2021), respectively. Channels refer to the category channels. We use the algorithms in the code-base without any hyper-parameter changes except for reduction of hidden-dim of the actor-critic networks to 64. The patch size follows the convention in Vision Transformers; for a 64x64 pixel input, we use 8x8 patches. |