reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonparametric Principal Subspace Regression

Authors: Yang Zhou, Mark Koudstaal, Dengdeng Yu, Dehan Kong, Fang Yao

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Favorable finite-sample performance is illustrated through simulated and real data examples in Section 4 and 5, respectively.
Researcher Affiliation	Academia	School of Statistics Beijing Normal University Beijing 100875, China; Department of Statistical Sciences University of Toronto Toronto, ON M5S 3G3, Canada; Department of Probability & Statistics Center for Statistical Science Peking University Beijing 100871, China
Pseudocode	No	The paper describes a 'two-step fitting procedure' in Section 2.2, but it is presented as regular text, not as a structured pseudocode block or algorithm: 'Step 1. For a given r ≤ q, let ˆU[r] = (ˆu1, . . . , ˆur) be the top r left singular vectors of data Y = (y1, . . . , yn) ∈ Rp×n from model (1); Step 2. Plug in ˆU[r] into RDn(G) and find the corresponding minimizers of the RDn(ˆuk, gk) by applying local polynomial smoothing for k = 1, . . . , r separately, denoted by ˆf1, . . . , ˆfr.'
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It only references links for publicly available datasets used in the experiments.
Open Datasets	Yes	We apply the proposed method to an EEG data set, which is available at https://archive.ics.uci.edu/ml/datasets/EEG+Database. For another data application, we analyze the motor task-related f MRI data from the Human Connectome Project (HCP) Data https://www.humanconnectome.org/
Dataset Splits	Yes	For each subject, we randomly reserve 10% of data as the test set: Stest ⊂ {1, . . . , 256} such that \|Stest\|/256 ≈ 10%, while using the rest as the training set, and report the prediction errors... Same as the above example, we randomly select 10% of data as the test set and the rest of the data as the training set for each subject.
Hardware Specification	No	No specific details about the hardware used for running the experiments are provided. The mention of '3 Tesla magnetic resonance imaging data' refers to data acquisition for the fMRI study, not the computational hardware for the proposed method.
Software Dependencies	No	The paper mentions using 'local polynomial regression with a Gaussian kernel' for implementation and 'five-fold cross-validation' for parameter selection, but does not specify any software names or version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	No	The paper describes the generation of simulated data and the general approach for real data applications, including data dimensions and evaluation metrics. It mentions choosing bandwidth 'by the standard five-fold cross-validation' and selecting 'r' by AIC, but it does not provide concrete hyperparameter values or detailed configurations for the local polynomial regression or other experimental settings.