reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Kernel Ridge Regression with Communications

Authors: Shao-Bo Lin, Di Wang, Ding-Xuan Zhou

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the last section, we conduct a series of numerical studies to verify the outperformance of DKRR with communications. ... We employ three criteria for comparisons. ... The testing results are shown in Figure 4 and Figure 5. Figure 4 shows the relation between MSE and the number of local machines by different numbers of communications.
Researcher Affiliation	Academia	Shao-Bo Lin EMAIL Center of Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China; Di Wang EMAIL Center of Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China; Ding-Xuan Zhou EMAIL School of Data Science and Department of Mathematics City University of Hong Kong Kowloon, Hong Kong, China. All listed institutions are universities.
Pseudocode	Yes	In order to derive an estimator with the operator representation (6), we propose a communication strategy for DKRR(ℓ) by iterating the following procedure for ℓ= 1, . . . , L. Step 1. Communicate the global estimator f ℓ 1 D,λ to local machines and get the local gradient function GDj,λ,ℓ:= GDj,λ,f ℓ 1 D,λ. Step 2. Communicate back {GDj,λ,ℓ: j = 1, . . . , m} to the global machine and synthesize the global gradient by GD,λ,ℓ:= Pm j=1 \|Dj\| \|D\| GDj,λ,ℓ. ... Appendix B. Training and Testing Flows for DKRR(ℓ): Training Flow: Step 1 (local process). ... Step 7 (final estimator)
Open Source Code	No	The paper does not explicitly provide a link to source code, nor does it state that the code for the methodology described in this paper is publicly released or available in supplementary materials.
Open Datasets	No	The inputs {xi}N i=1 of training samples are independently drawn according to the uniform distribution on the (hyper-)cube [0, 1]d with d = 1 or d = 3. The corresponding outputs {yi}N i=1 are generated from the regression models yi = gj(xi) + εi for i = 1, 2, , N and j = 1, 2, where εi is the independent Gaussian noise N(0, 0.2)... This indicates synthetic data generated by the authors, with no explicit public access provided.
Dataset Splits	Yes	We generate 10000 samples for training and 1000 samples for testing. ... The inputs {x i}N i=1 of testing samples are also drawn independently according to the uniform distribution on the (hyper-)cube [0, 1]d but the corresponding outputs {y i}N i=1 are generated by y i = gj(x i).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud computing instances) used for conducting the experiments.
Software Dependencies	No	The paper does not specify any software dependencies or versions (e.g., programming languages, libraries, or frameworks) used for the implementation or experiments.
Experiment Setup	No	The paper states that 'Regularization parameters in all experiments are selected by grid search', but it does not provide the specific values of these parameters or any other hyperparameters (e.g., learning rates, batch sizes, or model initialization) used in the experiments.