Learning to Warm-Start Fixed-Point Optimization Algorithms
Authors: Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts. Keywords: learning to optimize, fixed-point problems, warm start, generalization bounds, parametric convex optimization. (...) Section 6 presents various numerical benchmarks. |
| Researcher Affiliation | Collaboration | Rajiv Sambharya EMAIL Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA (...) Georgina Hall EMAIL Decision Sciences, INSEAD, Fontainebleau, France (...) Brandon Amos EMAIL Meta AI, New York City, NY, USA (...) Bartolomeo Stellato EMAIL Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA |
| Pseudocode | No | The paper describes algorithms using mathematical formulations (e.g., Table 1 shows 'Iterates zi+1 = Tθ(zi)' for various algorithms), and illustrates the architecture (Figure 2), but does not contain a dedicated, structured pseudocode block or algorithm section with explicit step-by-step instructions. |
| Open Source Code | Yes | The code to reproduce our results is available at https://github.com/stellatogrp/l2ws. |
| Open Datasets | Yes | We consider handwritten letters from the EMNIST dataset (Cohen et al., 2017). |
| Dataset Splits | Yes | We use 10000 training problems and evaluate on 1000 test problems for the examples except the first one in Section 6.1. |
| Hardware Specification | No | All computations were run on the Princeton HPC Della Cluster and each example could be trained under 5 hours. |
| Software Dependencies | No | We implemented our architecture in the JAX library (Bradbury et al., 2018) using the Adam (Kingma and Ba, 2015) optimizer to train. (...) Importantly, we code exact replicas of the OSQP and SCS algorithms in JAX. |
| Experiment Setup | Yes | We conduct a hyperparameter sweep over learning rates of either 10^-3 or 10^-4, and architectures with 0, 1, or 2 layers with 500 neurons each. We decay the learning rate by a factor of 5 when the training loss fails to decrease over a window of 10 epochs. |