Data-Driven Performance Guarantees for Classical and Learned Optimizers
Authors: Rajiv Sambharya, Bartolomeo Stellato
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds which hold with high probability are much tighter than those that worst-case guarantees provide. For learned optimizers, our bounds outperform the empirical outcomes observed in their non-learned counterparts. |
| Researcher Affiliation | Academia | Rajiv Sambharya EMAIL Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA Bartolomeo Stellato EMAIL Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA |
| Pseudocode | Yes | Algorithm 1 PAC-Bayes Learning to solve problem (15)... Algorithm 2 Calibrating the PAC-Bayes bounds |
| Open Source Code | Yes | The code to reproduce our results is available at https://github.com/stellatogrp/data_driven_optimizer_guarantees. |
| Open Datasets | Yes | We consider handwritten letters from the EMNIST dataset (Cohen et al., 2017). |
| Dataset Splits | Yes | We use 50000 training samples and evaluate on 1000 test problems in each example... We pick the number of datapoints in the training and test sets to be Ktrain = 5 and Ktest = 100 respectively. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, clock speeds, memory) are provided. The acknowledgements mention 'Princeton Research Computing resources at Princeton University' which is too general. |
| Software Dependencies | No | The paper mentions several software components like JAX, JAXOPT, ADAM, OSQP, SCS, and PEPit toolbox, but does not provide specific version numbers for any of them. For example, 'JAX (Bradbury et al., 2018)' refers to a publication year, not a software version number. |
| Experiment Setup | Yes | The hyperparameter weighting term is ρ = 10^-4. (Image deblurring) ... where ns = 4, no = 2, nu = 2, µ = 2, ρ = 2, and T = 50. (Kalman filtering) ... We take a matrix with dimensions m = 256, n = 512, and pick the number of algorithm steps to be K = 10. (Sparse coding) ... The neural network consists of two hidden layers of size 40 each with Re LU activations. The step size γ is 0.01, and we unroll 2 steps during training. (MAML) |