reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

Authors: Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, HDM-HB often matches the performance of L-BFGS, an efficient and practical quasi-Newton method, using less memory and cheaper iterations. This section conducts numerical experiments to validate the empirical performance of hypergradient descent. We compare HDM-Best (see Section 5.1 below) with different adaptive optimization algorithms.
Researcher Affiliation	Academia	1Department of Mathematics, Stanford University 2ICME, Stanford University 3Shanghai Jiao Tong University and Shanghai Institute for Mathematics and Interdisciplinary Sciences 4Management Science and Engineering, Stanford University. Correspondence to: Ya-Chi Chu <EMAIL>, Wenzhi Gao <EMAIL>.
Pseudocode	Yes	Algorithm 1 Hypergradient Descent Method (HDM) input initial point x1, P1 P (not necessarily PSD) for k = 1, 2,... do... Algorithm 3 HDM-Best input starting point x0 = x1, P = Sn + D, B = [0, 0.9995], initial diagonal preconditioner P1 Sn + D, initial scalar momentum parameter β1 = 0.95, Ada Grad stepsize ηp, ηb > 0, Ada Grad diagonal matrix U1 = 0, Ada Grad momentum scalar v1 = 0, τ > 0 for k = 1, 2,... do...
Open Source Code	Yes	This section highlights the major components of our most competitive variant HDM-Best. The algorithm and a more detailed explanation are available in Appendix A. The implementation is available at https://github.com/udellgroup/hypergrad.
Open Datasets	Yes	We test HDM-Best on deterministic convex problems. We adopt two convex optimization tasks in machine learning: support vector machine (Lee and Mangasarian, 2001) and logistic regression (Hastie, 2009). The testing datasets are obtained from LIBSVM (Chang and Lin, 2011).
Dataset Splits	No	The paper uses datasets obtained from LIBSVM but does not explicitly provide details about training/test/validation splits, exact percentages, or sample counts for reproduction. The 'Testing Configurations' section specifies maximum oracle access, initial point generation, and stopping criteria, but not dataset partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running experiments. Mentions of 'memory size' refer to algorithmic memory, not hardware.
Software Dependencies	No	The paper mentions using 'scipy' for BFGS and L-BFGS-Mk in the experimental setup, but it does not specify a version number for 'scipy' or any other software libraries used, which is necessary for reproducibility.
Experiment Setup	Yes	For HDM-Best, we search for the optimal ηp within {0.1/L, 1/L, 10/L, 100/L} and ηb {1, 3, 5, 10, 100}. Stepsize in GD, GD-HB, AGD-CVX, and AGD-SCVX are all set to 1/L. The Adam stepsize is chosen within the set {1/L, 10 3, 10 2, 10 1, 1, 10}. β1 = 0.9, β2 = 0.999. Maximum oracle access. We allow a maximum of 1000 gradient oracles for each algorithm. Initial point. All the algorithms are initialized from the same starting point generated from normal distribution N(0, In) and normalized to have unit length. Stopping criterion. Algorithms stop if f 10 4.