Automatic Differentiation of Optimization Algorithms with Time-Varying Updates
Authors: Sheheryar Mehmood, Peter Ochs
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test our results, we provide numerical demonstration on a few examples from classical Machine Learning. These include lasso regression, that is, ... We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... In Figure 1, the top row shows the median error plots of the five algorithms and the bottom row shows the errors of the corresponding derivatives with the same colour. |
| Researcher Affiliation | Academia | 1Department of Mathematics & Computer Science, Saarland University, Saarbr ucken, Germany. Correspondence to: Sheheryar Mehmood <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Proximal Gradient with Extrapolation. Initialization: x(0) = x(-1) ∈ X, u ∈ U, 0 < α_ <= α < 2/L. Parameter: (αk)k∈N ∈ [α_, α] and (βk)k∈N ∈ [0, 1]. Update k ≥ 0: y(k) := (1 + βk)x(k) − βkx(k-1) w(k) := y(k) − αk ∇xf(y(k), u) x(k+1) := Pαkg(w(k), u). |
| Open Source Code | No | The paper mentions autograd libraries like PyTorch, TensorFlow, and JAX as tools used, but does not provide specific access to the authors' own implementation code for the methodology described. |
| Open Datasets | Yes | We solve (16) for 50 randomly generated datasets, (17) for 50 perturbed instances of MADELON dataset (Dua & Graff, 2017), and (18) for a single instance of CIFAR10 dataset (Krizhevsky, 2009). |
| Dataset Splits | No | For (17), we use MADELON dataset with M = 2, 000 samples and N = 501 features. ... For (18), we use CIFAR10 dataset with M = 50, 000 samples N = 32 × 32 × 3 features. The paper specifies the total number of samples for these datasets but does not provide specific training/validation/test splits. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) are provided in the paper for running the experiments. |
| Software Dependencies | No | A crucial advantage of AD is that it provides a nice blackbox implementation thanks to the powerful autograd libraries included in Py Torch (Paszke et al., 2019), Tensor Flow (Abadi et al., 2016), and JAX (Bradbury et al., 2018). While these software packages are mentioned, no specific version numbers are provided for their usage in the experiments. |
| Experiment Setup | Yes | We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... For each problem, we run PGD with four different choices of step size, namely, (i) αk = 2/(L + m) for (17) and αk = 1/L for (16), (ii) αk ∈ U(0, 2/3L), (iii) αk ∈ U(2/3L, 4/3L), and (iv) αk ∈ U(4/3L, 2/L), for each k ∈ N. We also run APG with αk = 1/L and βk = (k − 1)/(k + 5). Before starting each algorithm, we obtain w(0) ∈ B10−2(w∗) by partially solving each problem through APG. |