Learning Curves of Stochastic Gradient Descent in Kernel Regression
Authors: Haihan Zhang, Weicheng Lin, Yuanshi Liu, Cong Fang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. Simulations In this section, we show our experiments with the NTK kernel κ1 NTK, which shows that the convergence rate of SGD matches our theoretical result. The data is generated as follows with a fixed f ρ : yi = f ρ (xi) + ϵi, i = 1, . . . , n, where xi is i.i.d. sampled from the uniform distribution on sphere Sd, and ϵi i.i.d N (0, 1). ... We numerically approximate the excess risk by the empirical excess risk on 1000 i.i.d. sampled data from the uniform distribution on the sphere Sd. As shown in Figure 1, the results support our theoretical findings and indicate that SGD with an exponentially decaying step size does not suffer from the saturation effect. |
| Researcher Affiliation | Academia | 1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University. Correspondence to: Cong Fang <EMAIL>. |
| Pseudocode | No | The paper describes the SGD updates in mathematical formulas, such as ft = ft 1 ηt (ft 1(xt) yt) Kxt. in Section 4.1, but does not include a distinct pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the methodology described. |
| Open Datasets | No | The data is generated as follows with a fixed f ρ : yi = f ρ (xi) + ϵi, i = 1, . . . , n, where xi is i.i.d. sampled from the uniform distribution on sphere Sd, and ϵi i.i.d N (0, 1). |
| Dataset Splits | No | The paper describes generating synthetic data and approximating excess risk on 1000 i.i.d. sampled data, but it does not specify explicit training/test/validation dataset splits typically used for reproducing experiments on pre-existing datasets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used in the research or simulations. |
| Experiment Setup | No | The paper specifies the generation process for the regression function f ρ and the ranges for n and d in the simulations (e.g., n from 1000 to 2000, d = n 2 3). It describes the form of step size schedules (exponentially decaying or constant with averaging) and their theoretical scaling (e.g., initial step size η0 = Θ(d γ+p log2 n ln d)), but it does not provide specific numerical values for hyperparameters like the exact initial step size η0 constant, or other relevant training configuration details for reproduction beyond theoretical scalings. |