Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning

Authors: Caleb Chuck, Fan Feng, Carl Qi, Chang Shi, Siddhant Agarwal, Amy Zhang, Scott Niekum

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that NCII matches the performance achieved by prior work in the Random Vectors domain (Hwang et al., 2023; Chuck et al., 2024b), demonstrating the efficacy of the method, and performs well in simulated robotics using Robosuite (Zhu et al., 2020), Robot Air Hockey (Chuck et al., 2024a), and Franka Kitchen (Gupta et al., 2019). Furthermore, we demonstrate that HInt improves sample efficiency by up to 4 in these domains as goal-conditioned tasks. We provide an empirical evaluation of NCII compared to existing actual cause inference methods in Random Vectors, Spriteworld, Robosuite, Robot Air Hockey, and Franka Kitchen domains when using ground truth variable state We evaluate the efficiency of HInt applied to GCRL in Spriteworld, Robosuite, Robot Air Hockey, and Franka Kitchen.
Researcher Affiliation Academia Caleb Chuck1 , Fan Feng2,3 , Carl Qi1, Chang Shi1, Siddhant Agarwal1, Amy Zhang1, Scott Niekum4 1 The University of Texas at Austin 2 University of California San Diego 3 MBZUAI 4 University of Massachusetts Amherst
Pseudocode Yes Algorithm 1 Hindsight relabeling using Interactions (HInt)
Open Source Code Yes We provide the implementation of NCII, HInt, and other baselines in the appendix code.
Open Datasets No We collect a fixed dataset of 1M states for random DAG and 2M states for Spriteworld, Robosuite Air Hockey, and Franka Kitchen. ... installation instructions, detailed settings, and configurations for the data generation process for the benchmarks and datasets can be found in Appendix H and the appendix code.
Dataset Splits No The paper mentions collecting fixed datasets of a specific size (e.g., '1M states for random DAG and 2M states for Spriteworld') and running experiments with multiple seeds ('All null experiments were collected with 10 seeds between 0-9. All RL experiments used 5 seeds between 0-4.'). However, it does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to any standard predefined splits.
Hardware Specification Yes All null experiments were collected with 10 seeds between 0-9. All RL experiments used 5 seeds between 0-4. The experiments were conducted on machines of the following configurations: 4 Nvidia A40 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 4 Quadro RTX 6000 GPU; 4 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 4 Nvidia 4090 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 2 Nvidia A100 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz
Software Dependencies No The paper states, 'We provide the implementation of NCII, HInt, and other baselines in the appendix code.' While this confirms code availability, it does not specify versions for programming languages, libraries, or other software components crucial for replication (e.g., Python version, PyTorch/TensorFlow version, specific solver versions).
Experiment Setup Yes In this section we describe the hyperparameters and training details for NCII and HInt. ... Encoding Dim 512 Hidden 3 512 Activation Leaky Re Lu (Table 4: Forward/inference Model) ... ϵnull 1 (log-likelihood space) Minimum Normalized distribution variance 0.001 Distribution Diagonal Gaussian Learning Rate 1 10 4 (Table 5: Null Parameters) ... Algorithm DDPG Batch Size 1024 Optimizer Adam Actor/critic learning rate 1 10 4 Exploration Noise 0.1 γ 0.9 Hidden Layers 2 512 τ 0.005 (Table 6: Reinforcement Learning Parameters) ... Domain Timeout Normalized Goal Epsilon Null Train Steps RL Train Steps (Table 7: Domain Specific Parameters)