Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications
Authors: Antoine Godichon-Baggioni, Pierre Tarrago
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Simulation study In this simulation study, we consider the following scenarios: Stochastic Newton Algorithm: ... Adagrad: ... 6.1 Linear model ... generate 50 datasets of size n = 10^5. ... In Figures 1 and 2, we analyze the evolution of the quadratic mean error of the estimators ... 6.2 Logistic regression ... In Figures 3 and 4, we analyze the evolution of the quadratic mean error of the estimates as a function of the sample size n. |
| Researcher Affiliation | Academia | Antoine Godichon-Baggioni EMAIL Laboratoire de Probabilités, Statistique et Modélisation Sorbonne Université Pierre Tarrago EMAIL Laboratoire de Probabilités, Statistique et Modélisation Sorbonne Université |
| Pseudocode | No | Then, an adaptive stochastic gradient algorithm is defined recursively for all n 0 by θn+1 = θn γn+1An hg (Xn+1, θn) , where θ0 is arbitrarily chosen... The stochastic Newton algorithm is defined recursively for all n 0 by (Boyer & Godichon-Baggioni, 2020) θn+1 = θn + γn+1 S 1 n Yn+1 XT n+1θn Xn+1 (6) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code, nor does it include links to a code repository. |
| Open Datasets | No | 6.1 Linear model We consider the linear model: Y = XT θ + ϵ, where X N (0, diag(1, . . . , d)) and ϵ N(0, 1). ... 6.2 Logistic regression We now consider the logistic regression case: Y |X B π θT X , where X N (0, diag(1, . . . , d)) and π(x) = ex. The paper describes how the data is generated, but does not provide specific access information or citation to a publicly available dataset instance. |
| Dataset Splits | No | In the following experiments, we set d = 10 and generate 50 datasets of size n = 10^5. The paper mentions the total size of the generated datasets but does not specify any training, validation, or test splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or cloud resources used for running experiments. |
| Software Dependencies | No | In practice, we generate a sample of size 10^7 and approximate the minimizer using the R function glmnet. Only "R function glmnet" is mentioned without a specific version number. |
| Experiment Setup | Yes | Stochastic Newton Algorithm: We set cγ = 1 and initialize An = 1 10Id to stabilize the algorithm during the first iterations. Additionally, and again for stabilization purposes, as suggested in Boyer & Godichon Baggioni (2020), we use a modified step size, taking γn = cγ (n+20)γ . We consider: The choice of γ: γ = 0.66 or γ = 0.75. The use of truncation or not, with cβ = 1 and β = γ 1/2, while employing the Frobenius norm. Adagrad: We set cγ = 1 and initialize An = Id. For stabilization purposes, as suggested in Boyer & Godichon-Baggioni (2020), we use a modified step size, taking γn = cγ (n+20)γ . We consider: The choice of γ: γ = 0.5 or γ = 0.75. The use of truncation or not, with cβ = 1, λ 0 = 1, β = 0.25 (resp. 0.125) and λ = 0.385 (resp. 0.25) if γ = 0.75 (resp. if γ = 0.5). ... In the following experiments, we set d = 10 and generate 50 datasets of size n = 10^5. Moreover, we consider random initializations θ0 = θ + U, where U N (0, Id). ... In addition, we set σ = 0.1 and denote by θ the minimizer of Gσ ... Moreover, we generate 50 datasets of size n = 10^5 and we consider random initializations θ0 = θ + U, where U N (0, Id). |