Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models
Authors: Christoph Schultheiss, Peter Bühlmann
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an algorithm for finite sample data, discuss its asymptotic properties, and illustrate its performance on simulated and real data. We evaluate the method on a simple SCM represented by the DAG in Figure 3. 5. Real Data Analysis We consider the K562 data set provided by Replogle et al. (2022). |
| Researcher Affiliation | Academia | Christoph Schultheiss EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH Peter B uhlmann EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH |
| Pseudocode | Yes | Algorithm 1 In-sample FOCI Algorithm 2 Sample splitting FOCI Algorithm 3 Selection of variables with well-specified effect using multiple splits |
| Open Source Code | Yes | Code scripts to reproduce the results presented in this paper are available here https://github.com/cschultheiss/nl_GOF. |
| Open Datasets | Yes | We consider the K562 data set provided by Replogle et al. (2022). We follow the preprocessing in the benchmark of Chevalley et al. (2023). |
| Dataset Splits | Yes | Split the data uniformly at random into two disjoint parts of sizes n/2 and n/2 , say, x(1), y(1) and x(2), y(2) |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU or GPU models, or memory specifications. It only mentions the software used for regression (xgboost) and the sample sizes tested. |
| Software Dependencies | No | The paper mentions the use of R-package xgboost (Chen et al., 2021), FOCI (Azadkia et al., 2021), d HSIC (Pfister and Peters, 2019), and mgcv (Wood, 2011), but does not explicitly state the version numbers of these software components within the main text. |
| Experiment Setup | Yes | We apply Algorithm 3 with B = 25 splits and the absolute value function as g( ). For the regression, we apply e Xtreme Gradient Boosting implemented in the R-package xgboost (Chen et al., 2021). We use the respective left-out split of the data for early stopping when fitting the regression functions. |