Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Kernel Partial Correlation Coefficient --- a Measure of Conditional Dependence

Authors: Zhen Huang, Nabarun Deb, Bodhisattva Sen

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulation and real-data examples illustrate the superior performance of our methods compared to existing procedures. ... In this section, we report the finite sample performance of ˆρ2, ρ2 and the related variable selection algorithms, on both simulated and real data examples. We consider both Euclidean and non-Euclidean responses Y in our examples. Even when restricted to Euclidean settings, our algorithms achieve superior performance compared to existing methods. All results are reproducible using our R package KPC (Huang, 2021) available on CRAN.
Researcher Affiliation Academia Zhen Huang EMAIL Nabarun Deb EMAIL Bodhisattva Sen EMAIL Department of Statistics Columbia University New York, NY 10027, USA
Pseudocode Yes Algorithm 1: KFOCI: a forward stepwise variable selection algorithm ... Algorithm 2: Forward stepwise variable selection algorithm using ρ2
Open Source Code Yes All results are reproducible using our R package KPC (Huang, 2021) available on CRAN.
Open Datasets Yes Spambase data: This data set is available from the UCI Machine Learning Repository (Dua and Graff, 2017)... Election data (histogram-valued response): Consider the 2017 Korean presidential election data collected by https://github.com/Ohmy News/2017-Election, which has been analyzed in the recent paper Jeon and Park (2020). ... Surgical data: The surgical data, available in the R package olsrr (Hebbali, 2018)
Dataset Splits Yes Spambase data: ...We then assign each data point with probability 0.8 (0.2) to training (test) set, and fit a random forest model (implemented in the R package random Forest (Liaw and Wiener, 2002) using default settings) with only the selected covariates. The mean squared error (MSE) on the test set is reported on the right panel of Figure 4.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments. It discusses software and datasets but provides no details on GPUs, CPUs, or memory.
Software Dependencies Yes All results are reproducible using our R package KPC (Huang, 2021) available on CRAN. ... FOCI (Azadkia et al., 2020), implemented in the R package FOCI ... olsrr (Hebbali, 2018). ... VSURF (Genuer et al., 2019). ... glmnet (Friedman et al., 2010) and the Dantzig selector was implemented using the package hdme (Sorensen, 2019); ... R package lars (Hastie and Efron, 2013). ... R package randomForest (Liaw and Wiener, 2002). ... R package party (Hothorn et al., 2015).
Experiment Setup Yes For ρ2, we set εn = 10-3n-0.4 for all the three models considered here... For KFOCI, we still use 1,2,10-NN graphs and MST as before... For Algorithm 2 denoted by KPC (RKHS), we set the kernel on Y as the same kernel for the methods 1-NN / 10-NN ; the kernel on XS is taken as k XS(x, x ) = exp x x 2 R|S|/|S| , and ε = 10-3.