Zero Shot Generalization of Vision-Based RL Without Data Augmentation

Authors: Sumeet Batra, Gaurav S. Sukhatme

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset ets al., 2017).
Researcher Affiliation Academia 1Department of Computer Science, University of Southern California, Los Angeles, USA. Correspondence to: Sumeet Batra <EMAIL>, Gaurav Sukhatme <EMAIL>.
Pseudocode Yes A.8. ALDA Pseudocode. Algorithm 1 ALDA Forward Pass. Algorithm 2 Associative Latent Dynamics.
Open Source Code No The paper mentions "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." which refers to a third-party implementation, but does not provide specific access information or an explicit statement about releasing the code for the methodology described in this paper.
Open Datasets Yes We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training... We do not expect to outperform SVEA since it uses additional data sampled from a dataset of 1.8 million diverse real-world scenes, likely putting the DMCGB evaluation tasks in-distribution... images sampled from the Places (Zhou et al., 2017) dataset.
Dataset Splits Yes We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset et al., 2017).
Hardware Specification No The paper does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper states: "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." This reference is to a PyTorch implementation but does not specify the version of PyTorch or any other software libraries used with their version numbers.
Experiment Setup Yes A.7.3. HYPERPAREMETERS. We list a set of common hyperparameters that are used in all domains. Table 1. Common hyperparameters for SAC and ALDA. Parameter Value Replay buffer capacity 1e6 Batch size 128 Latent model temperature β 100 Number of latents |zd| 12 Number of values per latent Vj 12 Encoder weight decay λθ 0.1 Decoder weight decay λϕ 0.1 Frame stack 3 Action repeat 2 for finger spin otherwise 4 Episode length 100 Observation space (9 x 64 x 64) Optimizer Adam Actor/Critic learning rate 1e-3 Encoder/Decoder learning rate 1e-3 Latent model learning rate 1e-3 Temperature learning rate 1e-4 Actor update frequency 2 Critic update frequency 2 Discount γ 0.99