reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Curves of Stochastic Gradient Descent in Kernel Regression

Authors: Haihan Zhang, Weicheng Lin, Yuanshi Liu, Cong Fang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Simulations In this section, we show our experiments with the NTK kernel κ1 NTK, which shows that the convergence rate of SGD matches our theoretical result. The data is generated as follows with a fixed f ρ : yi = f ρ (xi) + ϵi, i = 1, . . . , n, where xi is i.i.d. sampled from the uniform distribution on sphere Sd, and ϵi i.i.d N (0, 1). ... We numerically approximate the excess risk by the empirical excess risk on 1000 i.i.d. sampled data from the uniform distribution on the sphere Sd. As shown in Figure 1, the results support our theoretical findings and indicate that SGD with an exponentially decaying step size does not suffer from the saturation effect.
Researcher Affiliation	Academia	1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University. Correspondence to: Cong Fang <EMAIL>.
Pseudocode	No	The paper describes the SGD updates in mathematical formulas, such as ft = ft 1 ηt (ft 1(xt) yt) Kxt. in Section 4.1, but does not include a distinct pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the methodology described.
Open Datasets	No	The data is generated as follows with a fixed f ρ : yi = f ρ (xi) + ϵi, i = 1, . . . , n, where xi is i.i.d. sampled from the uniform distribution on sphere Sd, and ϵi i.i.d N (0, 1).
Dataset Splits	No	The paper describes generating synthetic data and approximating excess risk on 1000 i.i.d. sampled data, but it does not specify explicit training/test/validation dataset splits typically used for reproducing experiments on pre-existing datasets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used in the research or simulations.
Experiment Setup	No	The paper specifies the generation process for the regression function f ρ and the ranges for n and d in the simulations (e.g., n from 1000 to 2000, d = n 2 3). It describes the form of step size schedules (exponentially decaying or constant with averaging) and their theoretical scaling (e.g., initial step size η0 = Θ(d γ+p log2 n ln d)), but it does not provide specific numerical values for hyperparameters like the exact initial step size η0 constant, or other relevant training configuration details for reproduction beyond theoretical scalings.