Data-Driven Performance Guarantees for Classical and Learned Optimizers

Authors: Rajiv Sambharya, Bartolomeo Stellato

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds which hold with high probability are much tighter than those that worst-case guarantees provide. For learned optimizers, our bounds outperform the empirical outcomes observed in their non-learned counterparts.
Researcher Affiliation Academia Rajiv Sambharya EMAIL Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA Bartolomeo Stellato EMAIL Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA
Pseudocode Yes Algorithm 1 PAC-Bayes Learning to solve problem (15)... Algorithm 2 Calibrating the PAC-Bayes bounds
Open Source Code Yes The code to reproduce our results is available at https://github.com/stellatogrp/data_driven_optimizer_guarantees.
Open Datasets Yes We consider handwritten letters from the EMNIST dataset (Cohen et al., 2017).
Dataset Splits Yes We use 50000 training samples and evaluate on 1000 test problems in each example... We pick the number of datapoints in the training and test sets to be Ktrain = 5 and Ktest = 100 respectively.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, clock speeds, memory) are provided. The acknowledgements mention 'Princeton Research Computing resources at Princeton University' which is too general.
Software Dependencies No The paper mentions several software components like JAX, JAXOPT, ADAM, OSQP, SCS, and PEPit toolbox, but does not provide specific version numbers for any of them. For example, 'JAX (Bradbury et al., 2018)' refers to a publication year, not a software version number.
Experiment Setup Yes The hyperparameter weighting term is ρ = 10^-4. (Image deblurring) ... where ns = 4, no = 2, nu = 2, µ = 2, ρ = 2, and T = 50. (Kalman filtering) ... We take a matrix with dimensions m = 256, n = 512, and pick the number of algorithm steps to be K = 10. (Sparse coding) ... The neural network consists of two hidden layers of size 40 each with Re LU activations. The step size γ is 0.01, and we unroll 2 steps during training. (MAML)