reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Authors: Sheheryar Mehmood, Peter Ochs

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test our results, we provide numerical demonstration on a few examples from classical Machine Learning. These include lasso regression, that is, ... We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... In Figure 1, the top row shows the median error plots of the five algorithms and the bottom row shows the errors of the corresponding derivatives with the same colour.
Researcher Affiliation	Academia	1Department of Mathematics & Computer Science, Saarland University, Saarbr ucken, Germany. Correspondence to: Sheheryar Mehmood <EMAIL>.
Pseudocode	Yes	Algorithm 1 Proximal Gradient with Extrapolation. Initialization: x(0) = x(-1) ∈ X, u ∈ U, 0 < α_ <= α < 2/L. Parameter: (αk)k∈N ∈ [α_, α] and (βk)k∈N ∈ [0, 1]. Update k ≥ 0: y(k) := (1 + βk)x(k) − βkx(k-1) w(k) := y(k) − αk ∇xf(y(k), u) x(k+1) := Pαkg(w(k), u).
Open Source Code	No	The paper mentions autograd libraries like PyTorch, TensorFlow, and JAX as tools used, but does not provide specific access to the authors' own implementation code for the methodology described.
Open Datasets	Yes	We solve (16) for 50 randomly generated datasets, (17) for 50 perturbed instances of MADELON dataset (Dua & Graff, 2017), and (18) for a single instance of CIFAR10 dataset (Krizhevsky, 2009).
Dataset Splits	No	For (17), we use MADELON dataset with M = 2, 000 samples and N = 501 features. ... For (18), we use CIFAR10 dataset with M = 50, 000 samples N = 32 × 32 × 3 features. The paper specifies the total number of samples for these datasets but does not provide specific training/validation/test splits.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or memory) are provided in the paper for running the experiments.
Software Dependencies	No	A crucial advantage of AD is that it provides a nice blackbox implementation thanks to the powerful autograd libraries included in Py Torch (Paszke et al., 2019), Tensor Flow (Abadi et al., 2016), and JAX (Bradbury et al., 2018). While these software packages are mentioned, no specific version numbers are provided for their usage in the experiments.
Experiment Setup	Yes	We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... For each problem, we run PGD with four different choices of step size, namely, (i) αk = 2/(L + m) for (17) and αk = 1/L for (16), (ii) αk ∈ U(0, 2/3L), (iii) αk ∈ U(2/3L, 4/3L), and (iv) αk ∈ U(4/3L, 2/L), for each k ∈ N. We also run APG with αk = 1/L and βk = (k − 1)/(k + 5). Before starting each algorithm, we obtain w(0) ∈ B10−2(w∗) by partially solving each problem through APG.