Permutation-Free High-Order Interaction Tests

Authors: Zhaolu Liu, Robert Peach, Mauricio Barahona

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present implementations of the tests and showcase their efficacy and scalability through synthetic datasets. We also show applications inspired by causal discovery and feature selection, which highlight both the importance of high-order interactions in data and the need for efficient computational methods. Section 5. Experiments: We evaluate our proposed permutation-free tests using a range of experiments. For all cases, under the null hypothesis of each test, we confirm standard normality and controlled type-I errors of the corresponding xd HSIC, x LI & x SI statistics (see Appendix D, Figs. 10 11 for examples). Below we focus on the statistical power and computational efficiency of these methods compared to their permutation-based counterparts.
Researcher Affiliation Academia 1Department of Mathematics, Imperial College London, United Kingdom 2Department of Neurology, University Hospital W urzburg, Germany 3Department of Brain Sciences, Imperial College London, United Kingdom. Correspondence to: Mauricio Barahona <EMAIL>.
Pseudocode Yes Algorithm 1 Lancaster Interaction Test at d = 3
Open Source Code Yes Code Availability Code to implement the permutationfree tests in this paper available at https: //github.com/barahona-research-group/ Perm Free-HOI.git.
Open Datasets Yes Here, we use two datasets (A and B) from Sejdinovic et al. (2013a)... Following closely Simulation 4 from Pfister et al. (2018), we simulate n samples from one DAG with d = 4 variables (see Appendix D). ...daily returns of stocks in the S&P 500 from 2020 to 2024.
Dataset Splits Yes To extend the permutation-free pairwise independence measure of Shekhar et al. (2023) to the multivariate setting, we first define the unnormalised permutation-free d HSIC using the data-splitting technique: xd HSIC = ˆµ1 P1 d ˆµ1 P1 Pd , ˆµ2 P1 d ˆµ2 P1 Pd HS , where ˆµ1 ( ) and ˆµ2 ( ) are the empirical embeddings in d HSIC, estimated from two disjoint sample sets: the first half Sd 1 = {(x1 i , ,xd i ) 1 i n} and the second half Sd 2 = {(x1 i , ,xd i ) n + 1 i 2n} of the i.i.d. samples.
Hardware Specification Yes The CPU time for computing high-order interaction percentages took 8 hours without any parallelisation on a 2015 i Mac with 4 GHz Quad-Core Intel Core i7 processor and 32 GB 1867 MHz DDR3 memory.
Software Dependencies No The paper does not explicitly list software names with specific version numbers (e.g., Python 3.x, PyTorch 1.x) in the main text or appendices, beyond general mentions like 'scikit-learn: Machine learning in Python' (Pedregosa et al., 2011).
Experiment Setup Yes Unless noted otherwise, the significance level is set to α = 0.05, and we use Gaussian kernels with bandwidth given by the median heuristic. Additional details of each experiment are available in Appendix E. Each experiment uses n = 500 samples... we compare the accuracy of our permutation-free xd HSIC against the permutation-based d HSIC (using p = 200) to recover the ground truth DAG.