reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes

Authors: Georg Manten, Cecilia Casolo, Emilio Ferrucci, Søren Mogensen, Cristopher Salvi, Niki Kilbertus

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively benchmark the CI test in isolation and as part of our causal discovery algorithms, outperforming existing approaches in SDE models and beyond.
Researcher Affiliation	Academia	Georg Manten & Cecilia Casolo Technical University of Munich Helmholtz Munich Munich Center for Machine Learning EMAIL Emilio Ferrucci Mathematical Institute University of Oxford EMAIL Søren Wengel Mogensen Department of Automatic Control Lund University EMAIL Cristopher Salvi Department of Mathematics Imperial College London EMAIL Niki Kilbertus Technical University of Munich Helmholtz Munich Munich Center for Machine Learning EMAIL
Pseudocode	Yes	Algorithm 1: Causal discovery for acyclic SDEs.
Open Source Code	Yes	We will also make all code used to produce the results in this paper openly available.
Open Datasets	Yes	To demonstrate the applicability of our developed methods on real-data, we evaluate pairs trading strategies on ten stocks from the VBR Small Cap ETF over a three-year period (2010/01/01 2012/12/31). ... stock price data is downloaded from Yahoo Finance for a predefined list of stocks over a specific period
Dataset Splits	Yes	stock price data is downloaded from Yahoo Finance for a predefined list of stocks over a specific period, divided into training (1st January 2010 to 31st December 2011) and trading intervals (1st January 2012 to 31st December 2012).
Hardware Specification	No	The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models, memory specifications) used for running its experiments. It only refers to general computational aspects like highly parallelized execution on GPU accelerators when discussing the signature kernel implementation.
Software Dependencies	No	We use sigkerax for the signature kernel with an RBF kernel... For the Granger-implementation for two variables (d = 2), we used Seabold & Perktold (2010), for CCM we used Javier (2021), and for PCMCI we used the tigramite package (Runge et al., 2019). In PCMCI, tests for edges are conducted by applying distance correlation-based independence tests (Székely et al., 2007) between the variables residuals after regressing out other nodes using Gaussian processes. For SCOTCH implementation (Wang et al., 2024), we use the package causica.
Experiment Setup	Yes	For + s,h, s = 0.1 T (and a fixed T = 1) performed best (Table 5). ... We use sigkerax for the signature kernel with an RBF kernel with length scale selected via a median heuristic... In all the experiments of the paper, in the implementation of the signature kernel we use a depth parameter of 4, the RBF kernel and we add time as an extra dimension to the signature kernel. ... The number of bootstrap samples over permutation test is set to 100, the number of permutations for a single permutation test to 20000, the number of null samples via Monte Carlo from all values in permutation test to 20000 and 1000 for HSIC, the number of null samples for KCIT-bootstrap to 20000 and the number of null samples for SDCIT is set to 1000. ... We tested SCOTCH using various sparsity parameters and epochs to identify the optimal configuration. ... Table 9 confirms that the configuration with λ = 200 and ne = 2000 outperforms others... For SCOTCH, we always use a learning rate of 0.001 and keep the same default parameters for the learning algorithm.