Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder
Authors: Yuhan Zhang, Guoqing Ma, Guangfu Hao, Liangxuan Guo, Yang Chen, Shan Yu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted across various domains, including Deep Mind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as Dreamer V3 and Dr Q-v2, achieve state-of-the-art performance when equipped with APE. |
| Researcher Affiliation | Academia | 1Laboratory of Brain Atlas and Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Future Technology, University of Chinese Academy of Sciences 4Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences EMAIL |
| Pseudocode | No | The paper describes the methodology using textual descriptions and mathematical formulations, for example, in the 'Methodology' section and equations (2) to (7), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Experiments are conducted across various domains, including Deep Mind Control Suite, Atari Games and Memory Maze benchmarks... We pretrain APE on Image Net-100, a randomly selected subset of the common Image Net-1k (Deng et al. 2009), which has also been utilized in pervious works (Kalantidis et al. 2020; Zhang, Zhu, and Yu 2023) for pretext tasks. |
| Dataset Splits | No | We pretrain APE on Image Net-100... Results under linear classification protocol are reported in Table 1. The augmentation with varying applied frequency during pretraining is denoted as the main augmentation strategy (fmain). In our method, the default fmain is random gaussian blur, which proved to be the most promising setting in Ad DA. ... We evaluate the sample efficiency of APE on DMC vision tasks for 1M environment steps. ... Following the common setup of Atari 100k, we set the environment steps to 40k in tasks considered. ... In this paper, tasks on Memory Maze are trained for 2M steps due to limited computational resources. The paper mentions using a validation set for ImageNet-100 and environment steps for RL tasks, but does not specify the explicit training/test/validation dataset splits (percentages, counts, or exact methods) for any of the datasets used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions various algorithms and frameworks used (e.g., Dreamer V3, Dr Q-v2, Res Net18), but it does not specify any ancillary software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The first three layers of the encoder are frozen to maintain generalization ability while parameters in the last layer are optimized together with the world model to adapt to environments with distribution shifts. The fixed hyperparameters are set to β1 = 0.5 and β2 = 0.1. ... where α is set to 0.8 for 7 compositions, and 1 for 3 compositions... We evaluate the sample efficiency of APE on DMC vision tasks for 1M environment steps. ... Following the common setup of Atari 100k, we set the environment steps to 40k in tasks considered. ... In this paper, tasks on Memory Maze are trained for 2M steps... By default, the encoder uses the Res Net18 architecture (He et al. 2015). ... the latent dimension of the three architectures are kept same (4096) and both the Res Net18 and the Res Net50 architecture are pretrained on Image Net-100 with the same fmain. |