Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Authors: Antoine Godichon-Baggioni, Pierre Tarrago

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Simulation study In this simulation study, we consider the following scenarios: Stochastic Newton Algorithm: ... Adagrad: ... 6.1 Linear model ... generate 50 datasets of size n = 10^5. ... In Figures 1 and 2, we analyze the evolution of the quadratic mean error of the estimators ... 6.2 Logistic regression ... In Figures 3 and 4, we analyze the evolution of the quadratic mean error of the estimates as a function of the sample size n.
Researcher Affiliation Academia Antoine Godichon-Baggioni EMAIL Laboratoire de Probabilités, Statistique et Modélisation Sorbonne Université Pierre Tarrago EMAIL Laboratoire de Probabilités, Statistique et Modélisation Sorbonne Université
Pseudocode No Then, an adaptive stochastic gradient algorithm is defined recursively for all n 0 by θn+1 = θn γn+1An hg (Xn+1, θn) , where θ0 is arbitrarily chosen... The stochastic Newton algorithm is defined recursively for all n 0 by (Boyer & Godichon-Baggioni, 2020) θn+1 = θn + γn+1 S 1 n Yn+1 XT n+1θn Xn+1 (6)
Open Source Code No The paper does not provide any explicit statement about releasing source code, nor does it include links to a code repository.
Open Datasets No 6.1 Linear model We consider the linear model: Y = XT θ + ϵ, where X N (0, diag(1, . . . , d)) and ϵ N(0, 1). ... 6.2 Logistic regression We now consider the logistic regression case: Y |X B π θT X , where X N (0, diag(1, . . . , d)) and π(x) = ex. The paper describes how the data is generated, but does not provide specific access information or citation to a publicly available dataset instance.
Dataset Splits No In the following experiments, we set d = 10 and generate 50 datasets of size n = 10^5. The paper mentions the total size of the generated datasets but does not specify any training, validation, or test splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models or cloud resources used for running experiments.
Software Dependencies No In practice, we generate a sample of size 10^7 and approximate the minimizer using the R function glmnet. Only "R function glmnet" is mentioned without a specific version number.
Experiment Setup Yes Stochastic Newton Algorithm: We set cγ = 1 and initialize An = 1 10Id to stabilize the algorithm during the first iterations. Additionally, and again for stabilization purposes, as suggested in Boyer & Godichon Baggioni (2020), we use a modified step size, taking γn = cγ (n+20)γ . We consider: The choice of γ: γ = 0.66 or γ = 0.75. The use of truncation or not, with cβ = 1 and β = γ 1/2, while employing the Frobenius norm. Adagrad: We set cγ = 1 and initialize An = Id. For stabilization purposes, as suggested in Boyer & Godichon-Baggioni (2020), we use a modified step size, taking γn = cγ (n+20)γ . We consider: The choice of γ: γ = 0.5 or γ = 0.75. The use of truncation or not, with cβ = 1, λ 0 = 1, β = 0.25 (resp. 0.125) and λ = 0.385 (resp. 0.25) if γ = 0.75 (resp. if γ = 0.5). ... In the following experiments, we set d = 10 and generate 50 datasets of size n = 10^5. Moreover, we consider random initializations θ0 = θ + U, where U N (0, Id). ... In addition, we set σ = 0.1 and denote by θ the minimizer of Gσ ... Moreover, we generate 50 datasets of size n = 10^5 and we consider random initializations θ0 = θ + U, where U N (0, Id).