Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations

Authors: Yupei Yang, Biwei Huang, Fan Feng, Xinyue Wang, Shikui Tu, Lei Xu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the generalization capability of CSR on a number of simulated and well-established datasets2, including the Cart Pole, Coin Run and Atari environments, with detailed descriptions provided in Appendix D.3. For all these benchmarks, we evaluate the POMDP case, where the inputs are high-dimensional observations. Specifically, the evaluation focuses on answering the following key questions: Q1: Can CSR effectively detect and adapt to the two types of environmental changes? Q2: Does the incorporation of causal knowledge enhance the generalization performance? Q3: Is searching for the optimal expansion structure necessary? We compare our approach against several baselines: Dreamer (Hafner et al., 2023), which handles fixed tasks without integrating causal knowledge; Ada RL (Huang et al., 2021), which employs simple scenario-based policy adaptation without space expansion considerations; and the traditional model-free DQN (Mnih et al., 2015) and SPR (Schwarzer et al., 2020). Additionally, for the Atari games, we benchmark against the state-of-the-art method, Efficient Zero (Ye et al., 2021). All results are averaged over 5 runs, more implementation details can be found in Appendix D.
Researcher Affiliation Academia Yupei Yang1, Biwei Huang2 , Fan Feng2,3, Xinyue Wang2, Shikui Tu1 , Lei Xu1 1Shanghai Jiao Tong University, 2University of California San Diego, 3Mohamed bin Zayed University of Artificial Intelligence EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Towards Generalizable RL through CSR
Open Source Code Yes Code is available at https://github.com/CMACH508/CSR.
Open Datasets Yes We evaluate the generalization capability of CSR on a number of simulated and well-established datasets2, including the Cart Pole, Coin Run and Atari environments, with detailed descriptions provided in Appendix D.3. ... To illustrate, we reference the popular Coin Run environment (Cobbe et al., 2019). ... Atari 100K games, which includes 26 games with a budget of 400K environment steps (Kaiser et al., 2019).
Dataset Splits Yes For each of these games, we perform experiments among a sequence of four tasks, where each task randomly assigns a (mode, difficulty) pair. We then train these models on the source task and generalize them to downstream target tasks. ... For each task, agents are allowed to collect data over 100 episodes, each consisting of 256 time steps. ... Following Cobbe et al. (2019), we utilize a set of 500 levels as source tasks and generalize the agents to target tasks with higher difficulty levels outside these 500 levels.
Hardware Specification Yes All experiments are conducted using an Nvidia A100 GPU.
Software Dependencies No The paper mentions specific software components and algorithms like Dreamer, Gumbel-Softmax, Adam optimizer, and REINFORCE gradients, but does not provide specific version numbers for these dependencies, which are necessary for reproducible software details.
Experiment Setup Yes Table 4: Architecture and hyperparameters for the simulated environment. Table 5: Hyperparameters of CSR for Cart Pole, Coin Run and Atari games.