Learning Representations for Pixel-based Control: What Matters and Why?
Authors: Manan Tomar, Utkarsh Aashu Mishra, Amy Zhang, Matthew E. Taylor
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments across multiple settings, including the Mu Jo Co domains from DMC Suite (Tassa et al., 2018) with natural distractors (Zhang et al., 2018; Kay et al., 2017; Stone et al., 2021), and Atari100K Kaiser et al. (2019) from ALE (Bellemare et al., 2013). |
| Researcher Affiliation | Collaboration | Manan Tomar University of Alberta Amii Utkarsh A. Mishra University of Alberta Amii Amy Zhang FAIR, Menlo Park University of California, Berkeley Matthew E. Taylor University of Alberta Amii |
| Pseudocode | No | The paper describes methods and losses using textual descriptions and mathematical equations, such as LBaseline and LDREAMER, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available: https://github.com/Utkarsh Mishra04/pixel-representations-RL |
| Open Datasets | Yes | We conduct experiments across multiple settings, including the Mu Jo Co domains from DMC Suite (Tassa et al., 2018) with natural distractors (Zhang et al., 2018; Kay et al., 2017; Stone et al., 2021), and Atari100K Kaiser et al. (2019) from ALE (Bellemare et al., 2013). ... Kinetic dataset https://github.com/Showmax/kinetics-downloader |
| Dataset Splits | No | The paper describes training and evaluating agents within reinforcement learning environments (DMC Suite and Atari100K) over 'environment steps' or 'episodes', which is standard for RL. However, it does not specify explicit static training/test/validation dataset splits with percentages or sample counts, as such splits are not typically applicable in the same way as for supervised learning tasks using fixed datasets. |
| Hardware Specification | Yes | All experiments were conducted on either system configuration of: 1. 6 CPU cores of Intel Gold 6148 Skylake@2.4 GHz, one NVidia V100SXM2 (16G memory) GPU and 84 GB RAM. 2. 6 CPU cores of Intel Xeon Gold 5120 Skylake@2.2GHz, one NVIDIA V100 Volta (16GB HBM2 memory) GPU and 84 GB RAM |
| Software Dependencies | No | The paper mentions using the SAC algorithm and references existing open-source implementations for certain architectures, but it does not specify versions for core software dependencies like Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA libraries. |
| Experiment Setup | Yes | The full set of hyperparameters used for the baseline experiments are provided in Table 3 below. Table 3: Hyperparameters for Baseline and related ablations. (lists observation shape, latent dimension, replay buffer size, initial steps, stacked frames, action repeat, SAC hidden units, transition network details, reward network details, evaluation episodes, optimizer, beta values, learning rates, batch size, Q function EMA, critic target update freq, convolutional layers, number of filters, non-linearity, encoder EMA, discount gamma). |