reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Computational Limits of A Distributed Algorithm for Smoothing Spline

Authors: Zuofeng Shang, Guang Cheng

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1: Mean-square errors (MSE) of f based on 500 independent replications under different choices of N and s. Figure 2: Computing time of f based on a single replication under different choices of s when N = 10, 000. Acknowledgments We thank Ph D student Meimei Liu at Purdue for the simulation study.
Researcher Affiliation	Academia	Zuofeng Shang EMAIL Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Indianapolis, IN 46202, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA
Pseudocode	No	The paper describes the divide-and-conquer method conceptually and shows a diagram of the data distribution process, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets	No	The paper defines a nonparametric regression setup (yl = f(l/N) + ϵl) and describes simulations based on a defined function (f0(z) = 0.6b30,17(z) + 0.4b3,11(z)), but it does not use or make available any specific public datasets.
Dataset Splits	No	The paper describes how the entire dataset (N) is divided into 's' subsets of size 'n' for distributed processing across machines (N = s n). This is a data distribution strategy for the algorithm, not a description of training/testing/validation splits for model evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the simulations or computations, such as CPU or GPU models, memory, or cloud infrastructure.
Software Dependencies	No	The paper does not mention any specific software or library names with version numbers that would be required to reproduce the experiments.
Experiment Setup	No	The paper discusses theoretical optimal choices for parameters like the smoothing parameter λ and the number of machines s. It mentions choosing λ via a "distributed version of generalized cross validation (GCV)" and that simulations used "500 independent replications," but it does not provide concrete hyperparameter values, model initialization, or training configurations for any experiments.