reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Learning with Regularized Least Squares

Authors: Shao-Bo Lin, Xin Guo, Ding-Xuan Zhou

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a data set into disjoint data subsets, applies the least squares regularization scheme to each data subset to produce an output function, and then takes an average of the individual output functions as a final global estimator or predictor. We show with error bounds and learning rates in expectation in both the L2-metric and RKHS-metric that the global output function of this distributed learning is a good approximation to the algorithm processing the whole data in one single machine. Our derived learning rates in expectation are optimal and stated in a general setting without any eigenfunction assumption. The analysis is achieved by a novel second order decomposition of operator diﬀerences in our integral operator approach. Even for the classical least squares regularization scheme in the RKHS associated with a general kernel, we give the best learning rate in expectation in the literature.
Researcher Affiliation	Academia	Shao-Bo Lin EMAIL Department of Mathematics City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Xin Guo EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Ding-Xuan Zhou EMAIL Department of Mathematics City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Pseudocode	No	The paper describes algorithms and mathematical procedures using standard mathematical notation and text, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	No	The paper does not contain an unambiguous statement that the authors are releasing their code, nor does it provide a direct link to a source-code repository or mention code in supplementary materials for the methodology described.
Open Datasets	No	The paper discusses a theoretical learning framework that involves conceptual 'data sets' or 'samples' (e.g., 'a sample D = {(xi, yi)}N i=1 X Y independently drawn according to ρ'). However, it does not use or provide concrete access information (links, citations, repository names) for any specific publicly available or open dataset used in empirical experiments.
Dataset Splits	No	The paper is theoretical and does not describe any empirical experiments involving specific datasets. While it discusses 'partitioning the data set D into m disjoint subsets {Dj}m j=1' as part of the distributed learning algorithm's conceptual description, this is not a specification of training/test/validation splits for an actual dataset used in an experiment.
Hardware Specification	No	The paper is theoretical and does not describe any experiments or their implementation details that would necessitate mentioning specific hardware specifications like GPU/CPU models, processors, or cloud resources.
Software Dependencies	No	The paper is theoretical and focuses on mathematical derivations and error analysis. It does not mention any specific software, libraries, or their version numbers that would be required to reproduce experimental results.
Experiment Setup	No	The paper is theoretical and focuses on error analysis and learning rates. It does not describe any empirical experiments, and therefore, no specific experimental setup details such as hyperparameter values, training configurations, or system-level settings are provided.