Learning Representations for Pixel-based Control: What Matters and Why?

Authors: Manan Tomar, Utkarsh Aashu Mishra, Amy Zhang, Matthew E. Taylor

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments across multiple settings, including the Mu Jo Co domains from DMC Suite (Tassa et al., 2018) with natural distractors (Zhang et al., 2018; Kay et al., 2017; Stone et al., 2021), and Atari100K Kaiser et al. (2019) from ALE (Bellemare et al., 2013).
Researcher Affiliation Collaboration Manan Tomar University of Alberta Amii Utkarsh A. Mishra University of Alberta Amii Amy Zhang FAIR, Menlo Park University of California, Berkeley Matthew E. Taylor University of Alberta Amii
Pseudocode No The paper describes methods and losses using textual descriptions and mathematical equations, such as LBaseline and LDREAMER, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available: https://github.com/Utkarsh Mishra04/pixel-representations-RL
Open Datasets Yes We conduct experiments across multiple settings, including the Mu Jo Co domains from DMC Suite (Tassa et al., 2018) with natural distractors (Zhang et al., 2018; Kay et al., 2017; Stone et al., 2021), and Atari100K Kaiser et al. (2019) from ALE (Bellemare et al., 2013). ... Kinetic dataset https://github.com/Showmax/kinetics-downloader
Dataset Splits No The paper describes training and evaluating agents within reinforcement learning environments (DMC Suite and Atari100K) over 'environment steps' or 'episodes', which is standard for RL. However, it does not specify explicit static training/test/validation dataset splits with percentages or sample counts, as such splits are not typically applicable in the same way as for supervised learning tasks using fixed datasets.
Hardware Specification Yes All experiments were conducted on either system configuration of: 1. 6 CPU cores of Intel Gold 6148 Skylake@2.4 GHz, one NVidia V100SXM2 (16G memory) GPU and 84 GB RAM. 2. 6 CPU cores of Intel Xeon Gold 5120 Skylake@2.2GHz, one NVIDIA V100 Volta (16GB HBM2 memory) GPU and 84 GB RAM
Software Dependencies No The paper mentions using the SAC algorithm and references existing open-source implementations for certain architectures, but it does not specify versions for core software dependencies like Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA libraries.
Experiment Setup Yes The full set of hyperparameters used for the baseline experiments are provided in Table 3 below. Table 3: Hyperparameters for Baseline and related ablations. (lists observation shape, latent dimension, replay buffer size, initial steps, stacked frames, action repeat, SAC hidden units, transition network details, reward network details, evaluation episodes, optimizer, beta values, learning rates, batch size, Q function EMA, critic target update freq, convolutional layers, number of filters, non-linearity, encoder EMA, discount gamma).