Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Authors: Caleb Chuck, Fan Feng, Carl Qi, Chang Shi, Siddhant Agarwal, Amy Zhang, Scott Niekum
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that NCII matches the performance achieved by prior work in the Random Vectors domain (Hwang et al., 2023; Chuck et al., 2024b), demonstrating the efficacy of the method, and performs well in simulated robotics using Robosuite (Zhu et al., 2020), Robot Air Hockey (Chuck et al., 2024a), and Franka Kitchen (Gupta et al., 2019). Furthermore, we demonstrate that HInt improves sample efficiency by up to 4 in these domains as goal-conditioned tasks. We provide an empirical evaluation of NCII compared to existing actual cause inference methods in Random Vectors, Spriteworld, Robosuite, Robot Air Hockey, and Franka Kitchen domains when using ground truth variable state We evaluate the efficiency of HInt applied to GCRL in Spriteworld, Robosuite, Robot Air Hockey, and Franka Kitchen. |
| Researcher Affiliation | Academia | Caleb Chuck1 , Fan Feng2,3 , Carl Qi1, Chang Shi1, Siddhant Agarwal1, Amy Zhang1, Scott Niekum4 1 The University of Texas at Austin 2 University of California San Diego 3 MBZUAI 4 University of Massachusetts Amherst |
| Pseudocode | Yes | Algorithm 1 Hindsight relabeling using Interactions (HInt) |
| Open Source Code | Yes | We provide the implementation of NCII, HInt, and other baselines in the appendix code. |
| Open Datasets | No | We collect a fixed dataset of 1M states for random DAG and 2M states for Spriteworld, Robosuite Air Hockey, and Franka Kitchen. ... installation instructions, detailed settings, and configurations for the data generation process for the benchmarks and datasets can be found in Appendix H and the appendix code. |
| Dataset Splits | No | The paper mentions collecting fixed datasets of a specific size (e.g., '1M states for random DAG and 2M states for Spriteworld') and running experiments with multiple seeds ('All null experiments were collected with 10 seeds between 0-9. All RL experiments used 5 seeds between 0-4.'). However, it does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to any standard predefined splits. |
| Hardware Specification | Yes | All null experiments were collected with 10 seeds between 0-9. All RL experiments used 5 seeds between 0-4. The experiments were conducted on machines of the following configurations: 4 Nvidia A40 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 4 Quadro RTX 6000 GPU; 4 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 4 Nvidia 4090 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz 2 Nvidia A100 GPU; 8 Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz |
| Software Dependencies | No | The paper states, 'We provide the implementation of NCII, HInt, and other baselines in the appendix code.' While this confirms code availability, it does not specify versions for programming languages, libraries, or other software components crucial for replication (e.g., Python version, PyTorch/TensorFlow version, specific solver versions). |
| Experiment Setup | Yes | In this section we describe the hyperparameters and training details for NCII and HInt. ... Encoding Dim 512 Hidden 3 512 Activation Leaky Re Lu (Table 4: Forward/inference Model) ... ϵnull 1 (log-likelihood space) Minimum Normalized distribution variance 0.001 Distribution Diagonal Gaussian Learning Rate 1 10 4 (Table 5: Null Parameters) ... Algorithm DDPG Batch Size 1024 Optimizer Adam Actor/critic learning rate 1 10 4 Exploration Noise 0.1 γ 0.9 Hidden Layers 2 512 τ 0.005 (Table 6: Reinforcement Learning Parameters) ... Domain Timeout Normalized Goal Epsilon Null Train Steps RL Train Steps (Table 7: Domain Specific Parameters) |