Permutation-Free High-Order Interaction Tests
Authors: Zhaolu Liu, Robert Peach, Mauricio Barahona
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present implementations of the tests and showcase their efficacy and scalability through synthetic datasets. We also show applications inspired by causal discovery and feature selection, which highlight both the importance of high-order interactions in data and the need for efficient computational methods. Section 5. Experiments: We evaluate our proposed permutation-free tests using a range of experiments. For all cases, under the null hypothesis of each test, we confirm standard normality and controlled type-I errors of the corresponding xd HSIC, x LI & x SI statistics (see Appendix D, Figs. 10 11 for examples). Below we focus on the statistical power and computational efficiency of these methods compared to their permutation-based counterparts. |
| Researcher Affiliation | Academia | 1Department of Mathematics, Imperial College London, United Kingdom 2Department of Neurology, University Hospital W urzburg, Germany 3Department of Brain Sciences, Imperial College London, United Kingdom. Correspondence to: Mauricio Barahona <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Lancaster Interaction Test at d = 3 |
| Open Source Code | Yes | Code Availability Code to implement the permutationfree tests in this paper available at https: //github.com/barahona-research-group/ Perm Free-HOI.git. |
| Open Datasets | Yes | Here, we use two datasets (A and B) from Sejdinovic et al. (2013a)... Following closely Simulation 4 from Pfister et al. (2018), we simulate n samples from one DAG with d = 4 variables (see Appendix D). ...daily returns of stocks in the S&P 500 from 2020 to 2024. |
| Dataset Splits | Yes | To extend the permutation-free pairwise independence measure of Shekhar et al. (2023) to the multivariate setting, we first define the unnormalised permutation-free d HSIC using the data-splitting technique: xd HSIC = ˆµ1 P1 d ˆµ1 P1 Pd , ˆµ2 P1 d ˆµ2 P1 Pd HS , where ˆµ1 ( ) and ˆµ2 ( ) are the empirical embeddings in d HSIC, estimated from two disjoint sample sets: the first half Sd 1 = {(x1 i , ,xd i ) 1 i n} and the second half Sd 2 = {(x1 i , ,xd i ) n + 1 i 2n} of the i.i.d. samples. |
| Hardware Specification | Yes | The CPU time for computing high-order interaction percentages took 8 hours without any parallelisation on a 2015 i Mac with 4 GHz Quad-Core Intel Core i7 processor and 32 GB 1867 MHz DDR3 memory. |
| Software Dependencies | No | The paper does not explicitly list software names with specific version numbers (e.g., Python 3.x, PyTorch 1.x) in the main text or appendices, beyond general mentions like 'scikit-learn: Machine learning in Python' (Pedregosa et al., 2011). |
| Experiment Setup | Yes | Unless noted otherwise, the significance level is set to α = 0.05, and we use Gaussian kernels with bandwidth given by the median heuristic. Additional details of each experiment are available in Appendix E. Each experiment uses n = 500 samples... we compare the accuracy of our permutation-free xd HSIC against the permutation-based d HSIC (using p = 200) to recover the ground truth DAG. |