reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation

Authors: Marina Sheshukova, Denis Belomestny, Alain Oliviero Durmus, Eric Moulines, Aleksei Naumov, Sergey Samsonov

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 NUMERICAL RESULTS In this section we illustrate numerically the scale of the second-order terms in equation (41) in Corollary 7. We show that, for a particular minimization problem, setting γ = n 1/2, we achieve the scaling of the second-order terms in root-MSE bounds of order O(n 3/4). We consider the problem min θ R f(θ) , f(θ) = θ2 + cos θ , with the stochastic gradient oracles F(θ, ξ) given by F(θ, ξ) = 2θ sin θ+ξ, and ξ N(0, 1). This example satisfies the assumptions A1, A2, A3(p) with any p 2. We select different sample sizes n, choose γ = 1/ n, and construct the associated estimates θ(γ) n and θ(2γ) n . Detailed description of the experimental setting is provided in Appendix E. Then for each n we compute the Richardson-Romberg estimates θ(RR) n from (36) alongside with its versions without the leading term in n, i.e. θ(RR) n + n 1 P2n k=n+1 εk+1(θ ). We provide first the plot for θ(RR) n θ 2 and θ(RR) n +n 1 P2n k=n+1 εk+1(θ ) θ 2, averaged over M = 320 parallel runs, in Figure 1. On the same figure we also provide the plots for rescaled errors n θ(RR) n θ 2 and n3/2 θ(RR) n θ + n 1 X2n k=n+1 εk+1(θ ) 2 , also averaged over M parallel runs. The corresponding plot indicates that the proper scaling of the squared norm of the remainder part is n 3/2, that is, the corresponding term in root-MSE bound for E1/2 ν [ θ(RR) n θ 2] scales as O(n 3/4), as predicted by Corollary 7. Figure 1: Left picture: Richardson-Romberg experimental error with and without the leading term 1 n P2n k=n+1 εk+1(θ ). Right picture: same errors after rescaling by n and n3/2, respectively.
Researcher Affiliation	Academia	Marina Sheshukova1,5 Denis Belomestny2,1 Alain Durmus 3 Eric Moulines3,4 Alexey Naumov1,6 Sergey Samsonov1 1HSE University 2Duisburg-Essen University 3CMAP, UMR 7641, Ecole Polytechnique 4Mohamed Bin Zayed University of AI 5 Skolkovo Institute of Science and Technology 6 Steklov Mathematical Institute of Russian Academy of Sciences EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and derivations, such as (11) and (36), but does not contain a structured pseudocode or algorithm block.
Open Source Code	Yes	Code to reproduce experiments is provided at https://github.com/svsamsonov/richardson romberg example.
Open Datasets	No	We consider the problem min θ R f(θ) , f(θ) = θ2 + cos θ , with the stochastic gradient oracles F(θ, ξ) given by F(θ, ξ) = 2θ sin θ+ξ, and ξ N(0, 1).
Dataset Splits	No	We select different sample sizes n, choose γ = 1/ n, and construct the associated estimates θ(γ) n and θ(2γ) n . We conduct M = 320 independent parallel runs to approximate the expectations.
Hardware Specification	No	The paper does not provide any specific hardware details for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We select different sample sizes n = 250 2k, where k = 0, . . . , 14, and run the SGD procedure (2) based on the constant step sizes γ and 2γ, selecting γ = 1/ n. Then for each n we compute the Richardson-Romberg estimates θ(RR) n from (36) alongside with its versions without the leading term in n, i.e. θ(RR) n + n 1 P2n k=n+1 εk+1(θ ). ... We conduct M = 320 independent parallel runs to approximate the expectations.