Computational Limits of A Distributed Algorithm for Smoothing Spline
Authors: Zuofeng Shang, Guang Cheng
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Mean-square errors (MSE) of f based on 500 independent replications under different choices of N and s. Figure 2: Computing time of f based on a single replication under different choices of s when N = 10, 000. Acknowledgments We thank Ph D student Meimei Liu at Purdue for the simulation study. |
| Researcher Affiliation | Academia | Zuofeng Shang EMAIL Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Indianapolis, IN 46202, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA |
| Pseudocode | No | The paper describes the divide-and-conquer method conceptually and shows a diagram of the data distribution process, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository. |
| Open Datasets | No | The paper defines a nonparametric regression setup (yl = f(l/N) + ϵl) and describes simulations based on a defined function (f0(z) = 0.6b30,17(z) + 0.4b3,11(z)), but it does not use or make available any specific public datasets. |
| Dataset Splits | No | The paper describes how the entire dataset (N) is divided into 's' subsets of size 'n' for distributed processing across machines (N = s n). This is a data distribution strategy for the algorithm, not a description of training/testing/validation splits for model evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the simulations or computations, such as CPU or GPU models, memory, or cloud infrastructure. |
| Software Dependencies | No | The paper does not mention any specific software or library names with version numbers that would be required to reproduce the experiments. |
| Experiment Setup | No | The paper discusses theoretical optimal choices for parameters like the smoothing parameter λ and the number of machines s. It mentions choosing λ via a "distributed version of generalized cross validation (GCV)" and that simulations used "500 independent replications," but it does not provide concrete hyperparameter values, model initialization, or training configurations for any experiments. |