Self-Consistent Model-based Adaptation for Visual Reinforcement Learning
Authors: Xinning Zhou, Chengyang Ying, Yao Feng, Hang Su, Jun Zhu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency. |
| Researcher Affiliation | Academia | Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and derivations using mathematical formulas and textual explanations, but it does not include a clearly labeled pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not explicitly state that source code is released, nor does it provide a link to a code repository. It mentions 'More details can be found in Appendix C.1.' and 'further training details provided in Appendix C.' but these do not refer to code availability. |
| Open Datasets | Yes | To measure the effectiveness of SCMA, we follow the settings from the commonly adopted DMControl GB [Hansen and Wang, 2021; Hansen et al., 2021; Bertoin et al., 2022], DMControl View [Yang et al., 2024], and RL-Vi Gen [Yuan et al., 2024]. ... Following the official design [Hansen and Wang, 2021], the augmentation-based methods use random overlay with images from Place365 [Zhou et al., 2017]. |
| Dataset Splits | No | The paper mentions using specific environments like DMControl GB, DMControl View, and RL-Vi Gen, and discusses pre-training and adaptation phases, but it does not provide explicit details on how datasets for these environments are split into training, validation, or test sets (e.g., specific percentages or sample counts) within the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific libraries with their versions). |
| Experiment Setup | Yes | Adaptation-based methods will first be pre-trained in the clean environments for 1M timesteps and then adapt to the distracting environments for 0.1M timesteps (0.4M for video hard and 0.5M for RLVi Gen). |