Granger Causal Interaction Skill Chains
Authors: Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate COIn S on a robotic pushing task with obstacles a challenging domain where other RL and HRL methods fall short. We also demonstrate the transferability of skills learned by COIn S, using variants of Breakout, a common RL benchmark, and show 2-3x improvement in both sample efficiency and final performance compared to standard RL baselines. |
| Researcher Affiliation | Academia | Caleb Chuck EMAIL University of Texas at Austin Kevin Black University of California Berkeley Aditya Arjun University of Texas at Austin Yuke Zhu University of Texas at Austin Scott Niekum University of Massachusetts Amherst |
| Pseudocode | Yes | Algorithm box 9 describes the algorithm. |
| Open Source Code | No | The paper does not contain any explicit statement about providing access to their source code, nor does it include a link to a code repository. It mentions third-party tools like 'robosuite' but not their own implementation code. |
| Open Datasets | Yes | We systematically evaluate COIn S in two domains: 1) an adapted version of the common Atari baseline Breakout (Bellemare et al., 2013) (Figure 1 and Appendix A.1) and 2) a simulated Robot pushing domain in robosuite (Zhu et al., 2020) |
| Dataset Splits | No | The paper describes using a 'dataset D of state, action, next state tuples' collected during skill learning and mentions training and evaluation. However, it does not specify explicit train/test/validation splits for a fixed dataset, as it operates within continuous simulation environments where data is generated during training. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It mentions 'robotic pushing task' and 'simulated', but no specific hardware. |
| Software Dependencies | No | The paper mentions several software components and algorithms used (e.g., 'Rainbow', 'Soft Actor-Critic', 'Point Net style architecture', 'robosuite'). However, it does not provide specific version numbers for these components as they were used in the authors' implementation. |
| Experiment Setup | Yes | In Sections A.1 and A.2 we briefly describe some of the hyperparameter details related to the COIn S training process, but in this section, we provide additional context for the sensitivity of hyperparameters described in Table 6 and throughout the work. In general, the volume of hyperparameters comes from the reality that a hierarchical causal algorithm is a complex system. RL, causal learning, and hierarchy all contribute hyperparameters, and though the overall algorithm may not be sensitive to most of them, some value must be assigned to each. |