Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation

Authors: Michael Sucker, Jalal Fadili, Peter Ochs

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude. ... Figure 1 provides a preview of some experimental results, and the details will be provided in Section 7.
Researcher Affiliation Academia Michael Sucker EMAIL Department of Mathematics University of T ubingen T ubingen, Germany; Jalal Fadili EMAIL ENSICAEN Normandie Universit e CNRS, GREYC, France; Peter Ochs EMAIL Department of Mathematics and Computer Science Saarland University Saarbr ucken, Germany
Pseudocode Yes Algorithm 1 Iterative estimation of the probability ρ; Algorithm 2 Probabilistically constrained sampling; Algorithm 3 Procedure to find an initialization; Algorithm 4 Procedure to locate the prior; Algorithm 5 Procedure to construct the prior; Algorithm 6 Procedure to construct the posterior
Open Source Code Yes The entire code associated with this paper can be found at https://github.com/Michi Sucker/Learning-to-Optimize-with-PAC-Bayes.
Open Datasets Yes Appendix G. Additional Experiment on MNIST. This experiment considers the problem of training a neural network to do classification on the MNIST data set
Dataset Splits Yes We use the following training procedure in all experiments: N = Nprior + Ntrain + Nval + Ntest denotes the total number of data points, and we use Nprior = ... = Ntest = 250.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or processor types) are provided for the experimental setup. The paper only mentions general computational cost.
Software Dependencies No Sub)Gradients are defined by the output of backpropagation as it is implemented in Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify a version number.
Experiment Setup Yes In Algorithm 1, we use ρl = 0.95, ρu = 1.0, ql = 0.01, qu = 0.99, and ε = 0.075. ... In Algorithm 3, we use Adam with an initial step-size of 10-3, which gets reduced by a factor of 0.5 every 200 iterations, until an accuracy of ε = 10-2 is reached, or for at most ninit = 103 iterations.