Computational Limits of A Distributed Algorithm for Smoothing Spline

Authors: Zuofeng Shang, Guang Cheng

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Mean-square errors (MSE) of f based on 500 independent replications under different choices of N and s. Figure 2: Computing time of f based on a single replication under different choices of s when N = 10, 000. Acknowledgments We thank Ph D student Meimei Liu at Purdue for the simulation study.
Researcher Affiliation Academia Zuofeng Shang EMAIL Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Indianapolis, IN 46202, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA
Pseudocode No The paper describes the divide-and-conquer method conceptually and shows a diagram of the data distribution process, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets No The paper defines a nonparametric regression setup (yl = f(l/N) + ϵl) and describes simulations based on a defined function (f0(z) = 0.6b30,17(z) + 0.4b3,11(z)), but it does not use or make available any specific public datasets.
Dataset Splits No The paper describes how the entire dataset (N) is divided into 's' subsets of size 'n' for distributed processing across machines (N = s n). This is a data distribution strategy for the algorithm, not a description of training/testing/validation splits for model evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the simulations or computations, such as CPU or GPU models, memory, or cloud infrastructure.
Software Dependencies No The paper does not mention any specific software or library names with version numbers that would be required to reproduce the experiments.
Experiment Setup No The paper discusses theoretical optimal choices for parameters like the smoothing parameter λ and the number of machines s. It mentions choosing λ via a "distributed version of generalized cross validation (GCV)" and that simulations used "500 independent replications," but it does not provide concrete hyperparameter values, model initialization, or training configurations for any experiments.