reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal Information Prioritization for Efficient Reinforcement Learning

Authors: Hongye Cao, Fan Feng, Tianpei Yang, Jing Huo, Yang Gao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To fully assess the effectiveness of CIP, we conduct extensive experiments across 39 tasks in 5 diverse continuous control environments, encompassing both locomotion and manipulation skills learning with pixel-based and sparse reward settings. Experimental results demonstrate that CIP consistently outperforms existing RL methods across a wide range of scenarios.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University 2University of California, San Diego 3MBZUAI 4School of Intelligence Science and Technology, Nanjing University
Pseudocode	Yes	Algorithm 1 illustrates the complete CIP pipeline.
Open Source Code	Yes	We provide the core code of CIP in the supplementary material.
Open Datasets	Yes	We evaluate CIP on 5 continuous control environments, including Mu Jo Co (Todorov et al., 2012), DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2020), Adroit Hand (Rajeswaran et al., 2018), and sparse reward setting environments in Meta-World.
Dataset Splits	No	The paper describes using established environments/tasks for reinforcement learning, but does not specify explicit training/test/validation dataset splits (e.g., as percentages or sample counts) for a fixed dataset, which is less common in RL where data is generated through interaction.
Hardware Specification	Yes	All experiments of this approach are implemented on 2 Intel(R) Xeon(R) Gold 6430 and 2 NVIDIA Tesla A800 GPUs.
Software Dependencies	No	In our code, we have utilized the following libraries, each covered by its respective license agreements: Py Torch (BSD 3-Clause New or Revised License) Numpy (BSD 3-Clause New or Revised License) Tensorflow (Apache License 2.0) Meta-World (MIT License) Mu Jo Co (Apache License 2.0) Deep Mind Control (Apache License 2.0) Adroit Hand (Creative Commons License 3.0)
Experiment Setup	Yes	We present the detailed hyperparameter settings of the proposed method CIP across all 5 environments in Table 3. Additionally, we set the target update interval of 2. For fair comparison, the hyperparameters of the baseline methods (SAC (Haarnoja et al., 2018), BAC (Ji et al., 2024b), ACE (Ji et al., 2024a)) follow the same settings in the experiments. Our analysis reveals that setting α to 0.2 yields optimal performance across all tasks, which motivated our choice of α = 0.2 for all experiments.