reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kernel Partial Correlation Coefficient --- a Measure of Conditional Dependence

Authors: Zhen Huang, Nabarun Deb, Bodhisattva Sen

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulation and real-data examples illustrate the superior performance of our methods compared to existing procedures. ... In this section, we report the ﬁnite sample performance of ˆρ2, ρ2 and the related variable selection algorithms, on both simulated and real data examples. We consider both Euclidean and non-Euclidean responses Y in our examples. Even when restricted to Euclidean settings, our algorithms achieve superior performance compared to existing methods. All results are reproducible using our R package KPC (Huang, 2021) available on CRAN.
Researcher Affiliation	Academia	Zhen Huang EMAIL Nabarun Deb EMAIL Bodhisattva Sen EMAIL Department of Statistics Columbia University New York, NY 10027, USA
Pseudocode	Yes	Algorithm 1: KFOCI: a forward stepwise variable selection algorithm ... Algorithm 2: Forward stepwise variable selection algorithm using ρ2
Open Source Code	Yes	All results are reproducible using our R package KPC (Huang, 2021) available on CRAN.
Open Datasets	Yes	Spambase data: This data set is available from the UCI Machine Learning Repository (Dua and Graﬀ, 2017)... Election data (histogram-valued response): Consider the 2017 Korean presidential election data collected by https://github.com/Ohmy News/2017-Election, which has been analyzed in the recent paper Jeon and Park (2020). ... Surgical data: The surgical data, available in the R package olsrr (Hebbali, 2018)
Dataset Splits	Yes	Spambase data: ...We then assign each data point with probability 0.8 (0.2) to training (test) set, and ﬁt a random forest model (implemented in the R package random Forest (Liaw and Wiener, 2002) using default settings) with only the selected covariates. The mean squared error (MSE) on the test set is reported on the right panel of Figure 4.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running the experiments. It discusses software and datasets but provides no details on GPUs, CPUs, or memory.
Software Dependencies	Yes	All results are reproducible using our R package KPC (Huang, 2021) available on CRAN. ... FOCI (Azadkia et al., 2020), implemented in the R package FOCI ... olsrr (Hebbali, 2018). ... VSURF (Genuer et al., 2019). ... glmnet (Friedman et al., 2010) and the Dantzig selector was implemented using the package hdme (Sorensen, 2019); ... R package lars (Hastie and Efron, 2013). ... R package randomForest (Liaw and Wiener, 2002). ... R package party (Hothorn et al., 2015).
Experiment Setup	Yes	For ρ2, we set εn = 10-3n-0.4 for all the three models considered here... For KFOCI, we still use 1,2,10-NN graphs and MST as before... For Algorithm 2 denoted by KPC (RKHS), we set the kernel on Y as the same kernel for the methods 1-NN / 10-NN ; the kernel on XS is taken as k XS(x, x ) = exp x x 2 R\|S\|/\|S\| , and ε = 10-3.