Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models

Authors: Christoph Schultheiss, Peter Bühlmann

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an algorithm for finite sample data, discuss its asymptotic properties, and illustrate its performance on simulated and real data. We evaluate the method on a simple SCM represented by the DAG in Figure 3. 5. Real Data Analysis We consider the K562 data set provided by Replogle et al. (2022).
Researcher Affiliation Academia Christoph Schultheiss EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH Peter B uhlmann EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH
Pseudocode Yes Algorithm 1 In-sample FOCI Algorithm 2 Sample splitting FOCI Algorithm 3 Selection of variables with well-specified effect using multiple splits
Open Source Code Yes Code scripts to reproduce the results presented in this paper are available here https://github.com/cschultheiss/nl_GOF.
Open Datasets Yes We consider the K562 data set provided by Replogle et al. (2022). We follow the preprocessing in the benchmark of Chevalley et al. (2023).
Dataset Splits Yes Split the data uniformly at random into two disjoint parts of sizes n/2 and n/2 , say, x(1), y(1) and x(2), y(2)
Hardware Specification No The paper does not provide specific hardware details such as CPU or GPU models, or memory specifications. It only mentions the software used for regression (xgboost) and the sample sizes tested.
Software Dependencies No The paper mentions the use of R-package xgboost (Chen et al., 2021), FOCI (Azadkia et al., 2021), d HSIC (Pfister and Peters, 2019), and mgcv (Wood, 2011), but does not explicitly state the version numbers of these software components within the main text.
Experiment Setup Yes We apply Algorithm 3 with B = 25 splits and the absolute value function as g( ). For the regression, we apply e Xtreme Gradient Boosting implemented in the R-package xgboost (Chen et al., 2021). We use the respective left-out split of the data for early stopping when fitting the regression functions.