reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models

Authors: Christoph Schultheiss, Peter Bühlmann

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose an algorithm for ﬁnite sample data, discuss its asymptotic properties, and illustrate its performance on simulated and real data. We evaluate the method on a simple SCM represented by the DAG in Figure 3. 5. Real Data Analysis We consider the K562 data set provided by Replogle et al. (2022).
Researcher Affiliation	Academia	Christoph Schultheiss EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH Peter B uhlmann EMAIL Seminar for Statistics ETH Z urich Z urich, 8092, CH
Pseudocode	Yes	Algorithm 1 In-sample FOCI Algorithm 2 Sample splitting FOCI Algorithm 3 Selection of variables with well-speciﬁed eﬀect using multiple splits
Open Source Code	Yes	Code scripts to reproduce the results presented in this paper are available here https://github.com/cschultheiss/nl_GOF.
Open Datasets	Yes	We consider the K562 data set provided by Replogle et al. (2022). We follow the preprocessing in the benchmark of Chevalley et al. (2023).
Dataset Splits	Yes	Split the data uniformly at random into two disjoint parts of sizes n/2 and n/2 , say, x(1), y(1) and x(2), y(2)
Hardware Specification	No	The paper does not provide specific hardware details such as CPU or GPU models, or memory specifications. It only mentions the software used for regression (xgboost) and the sample sizes tested.
Software Dependencies	No	The paper mentions the use of R-package xgboost (Chen et al., 2021), FOCI (Azadkia et al., 2021), d HSIC (Pﬁster and Peters, 2019), and mgcv (Wood, 2011), but does not explicitly state the version numbers of these software components within the main text.
Experiment Setup	Yes	We apply Algorithm 3 with B = 25 splits and the absolute value function as g( ). For the regression, we apply e Xtreme Gradient Boosting implemented in the R-package xgboost (Chen et al., 2021). We use the respective left-out split of the data for early stopping when ﬁtting the regression functions.