Self-supervised Color Generalization in Reinforcement Learning

Authors: Matthias Weissenbacher, Evangelos Routis, Yoshinobu Kawahara

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites and find improved color sensitivity and generalisation.
Researcher Affiliation Collaboration Matthias Weissenbacher EMAIL Riken Center for Advanced Intelligence Project Pyr-SAI Labs Japan Evangelos Routis EMAIL Causaly London United Kingdom Yoshinobu Kawahara Riken Center for Advanced Intelligence Project Osaka University Japan
Pseudocode No The paper describes algorithms like rDMD and CiL mathematically and in narrative text, but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes In section 4.2 we perform our main experiments on the Procgen environment. The code is made public at Git Hub.
Open Datasets Yes We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites... The Lava Crossing environment, a standard in the Mini Grid toolkit (Chevalier-Boisvert et al., 2019)... The Procgen benchmark consists of sixteen procedurally generated games... Procgen generalization benchmark (Cobbe et al., 2020)... Deepmind Control suite (DMControl) (Tassa et al., 2018).
Dataset Splits Yes Following the setup from (Cobbe et al., 2020), agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated by sampling seeds uniformly at random from all computer integers).
Hardware Specification Yes All experiments were performed on NVIDIA GPU A-100 or V-100.
Software Dependencies No The paper mentions 'torch.svd' and algorithms 'PPO/Dr AC' and 'SAC' but does not provide specific version numbers for these or other software libraries.
Experiment Setup Yes We summarize the hyperparameter choices in Table (5). Table 5: Architecture and hyper-parameter choices for Ci L on Procgen, DMControl, Minigrid based on (Raileanu et al., 2020), (Hansen & Wang, 2021), and (Jiang et al., 2021), respectively. Channels refer to the category channels. We use the algorithms in the code-base without any hyper-parameter changes except for reduction of hidden-dim of the actor-critic networks to 64. The patch size follows the convention in Vision Transformers; for a 64x64 pixel input, we use 8x8 patches.