reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning causal graphs via nonlinear sufficient dimension reduction

Authors: Eftychia Solea, Bing Li, Kyongwon Kim

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our methodology through simulations and a real data analysis. In this section, we evaluate the performance of our DAG estimator, referred to as the DAG-PC algorithm, through simulation comparisons with other methods and a data application.
Researcher Affiliation	Academia	Eftychia Solea EMAIL School of Mathematical Sciences, Queen Mary University of London Mile End, E1 4NS, London, UK Bing Li EMAIL Department of Statistics, Pennsylvania State University 326 Thomas Building, University Park, PA 16802, US Kyongwon Kim EMAIL Department of Applied Statistics Department of Statistics and Data Science Yonsei University 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, South Korea
Pseudocode	Yes	We display the new version of the PC-algorithm in the Algorithm 1 in the form of pseudo-codes. Algorithm 1 below describes only the ﬁrst part of the DAG-PC algorithm that identiﬁes the skeleton of the DAG.
Open Source Code	No	Methods A and B were implemented using the pcalg package (Kalisch et al., 2012) in R, while for Method C, we used the kpcalg package (Verbyla et al., 2017) in R. (This refers to other methods' code, not their own). There is no explicit statement about releasing their own code.
Open Datasets	Yes	We apply our method to the ﬂow cytometry dataset (Sachs et al., 2005)... This dataset can be downloaded from https : //github.com/fernando Palluzzi/SEMgraph.
Dataset Splits	No	The paper mentions generating data with specific sample sizes (n=100, 150, 200) for simulations and using n=90 observations for a real data application. It describes subsampling for repeated evaluations in the data application, but it does not specify explicit training/test/validation splits with percentages, counts, or a detailed methodology for partitioning datasets to train and evaluate models in the conventional machine learning sense. The PC algorithm typically infers directly from the given dataset.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments or simulations are mentioned in the paper.
Software Dependencies	No	The paper mentions specific R packages ('pcalg' and 'kpcalg') used to implement comparison methods (Method A, B, and C). However, it does not provide any specific, versioned software dependencies for the implementation of their own proposed methodology (Method D, DAG-PC algorithm).
Experiment Setup	Yes	In this subsection, we propose the tuning procedures involved in various steps in our method. Speciﬁcally, for step 1, the tuning parameters include the kernel parameters κXi, i = 1, . . . , p, the Tychonoff regularization tuning constants ηn and ϵn, and the dimension d ij S of the sufﬁcient predictor ˆU ij,S. For step 2, the tuning parameters include the number of leading eigenvalues of GXi, ri, the Tychonoff regularization parameter for the CCCO, δn, and the thresholding constant ρn in the estimation of the skeleton in (16)... For this reason, we present only the results for d ij S = 1.