Causal Information Prioritization for Efficient Reinforcement Learning

Authors: Hongye Cao, Fan Feng, Tianpei Yang, Jing Huo, Yang Gao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To fully assess the effectiveness of CIP, we conduct extensive experiments across 39 tasks in 5 diverse continuous control environments, encompassing both locomotion and manipulation skills learning with pixel-based and sparse reward settings. Experimental results demonstrate that CIP consistently outperforms existing RL methods across a wide range of scenarios.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University 2University of California, San Diego 3MBZUAI 4School of Intelligence Science and Technology, Nanjing University
Pseudocode Yes Algorithm 1 illustrates the complete CIP pipeline.
Open Source Code Yes We provide the core code of CIP in the supplementary material.
Open Datasets Yes We evaluate CIP on 5 continuous control environments, including Mu Jo Co (Todorov et al., 2012), DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2020), Adroit Hand (Rajeswaran et al., 2018), and sparse reward setting environments in Meta-World.
Dataset Splits No The paper describes using established environments/tasks for reinforcement learning, but does not specify explicit training/test/validation dataset splits (e.g., as percentages or sample counts) for a fixed dataset, which is less common in RL where data is generated through interaction.
Hardware Specification Yes All experiments of this approach are implemented on 2 Intel(R) Xeon(R) Gold 6430 and 2 NVIDIA Tesla A800 GPUs.
Software Dependencies No In our code, we have utilized the following libraries, each covered by its respective license agreements: Py Torch (BSD 3-Clause New or Revised License) Numpy (BSD 3-Clause New or Revised License) Tensorflow (Apache License 2.0) Meta-World (MIT License) Mu Jo Co (Apache License 2.0) Deep Mind Control (Apache License 2.0) Adroit Hand (Creative Commons License 3.0)
Experiment Setup Yes We present the detailed hyperparameter settings of the proposed method CIP across all 5 environments in Table 3. Additionally, we set the target update interval of 2. For fair comparison, the hyperparameters of the baseline methods (SAC (Haarnoja et al., 2018), BAC (Ji et al., 2024b), ACE (Ji et al., 2024a)) follow the same settings in the experiments. Our analysis reveals that setting α to 0.2 yields optimal performance across all tasks, which motivated our choice of α = 0.2 for all experiments.