Zero Shot Generalization of Vision-Based RL Without Data Augmentation
Authors: Sumeet Batra, Gaurav S. Sukhatme
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset ets al., 2017). |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Southern California, Los Angeles, USA. Correspondence to: Sumeet Batra <EMAIL>, Gaurav Sukhatme <EMAIL>. |
| Pseudocode | Yes | A.8. ALDA Pseudocode. Algorithm 1 ALDA Forward Pass. Algorithm 2 Associative Latent Dynamics. |
| Open Source Code | No | The paper mentions "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." which refers to a third-party implementation, but does not provide specific access information or an explicit statement about releasing the code for the methodology described in this paper. |
| Open Datasets | Yes | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training... We do not expect to outperform SVEA since it uses additional data sampled from a dataset of 1.8 million diverse real-world scenes, likely putting the DMCGB evaluation tasks in-distribution... images sampled from the Places (Zhou et al., 2017) dataset. |
| Dataset Splits | Yes | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset et al., 2017). |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states: "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." This reference is to a PyTorch implementation but does not specify the version of PyTorch or any other software libraries used with their version numbers. |
| Experiment Setup | Yes | A.7.3. HYPERPAREMETERS. We list a set of common hyperparameters that are used in all domains. Table 1. Common hyperparameters for SAC and ALDA. Parameter Value Replay buffer capacity 1e6 Batch size 128 Latent model temperature β 100 Number of latents |zd| 12 Number of values per latent Vj 12 Encoder weight decay λθ 0.1 Decoder weight decay λϕ 0.1 Frame stack 3 Action repeat 2 for finger spin otherwise 4 Episode length 100 Observation space (9 x 64 x 64) Optimizer Adam Actor/Critic learning rate 1e-3 Encoder/Decoder learning rate 1e-3 Latent model learning rate 1e-3 Temperature learning rate 1e-4 Actor update frequency 2 Critic update frequency 2 Discount γ 0.99 |