Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation
Authors: Michael Sucker, Jalal Fadili, Peter Ochs
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude. ... Figure 1 provides a preview of some experimental results, and the details will be provided in Section 7. |
| Researcher Affiliation | Academia | Michael Sucker EMAIL Department of Mathematics University of T ubingen T ubingen, Germany; Jalal Fadili EMAIL ENSICAEN Normandie Universit e CNRS, GREYC, France; Peter Ochs EMAIL Department of Mathematics and Computer Science Saarland University Saarbr ucken, Germany |
| Pseudocode | Yes | Algorithm 1 Iterative estimation of the probability ρ; Algorithm 2 Probabilistically constrained sampling; Algorithm 3 Procedure to find an initialization; Algorithm 4 Procedure to locate the prior; Algorithm 5 Procedure to construct the prior; Algorithm 6 Procedure to construct the posterior |
| Open Source Code | Yes | The entire code associated with this paper can be found at https://github.com/Michi Sucker/Learning-to-Optimize-with-PAC-Bayes. |
| Open Datasets | Yes | Appendix G. Additional Experiment on MNIST. This experiment considers the problem of training a neural network to do classification on the MNIST data set |
| Dataset Splits | Yes | We use the following training procedure in all experiments: N = Nprior + Ntrain + Nval + Ntest denotes the total number of data points, and we use Nprior = ... = Ntest = 250. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or processor types) are provided for the experimental setup. The paper only mentions general computational cost. |
| Software Dependencies | No | Sub)Gradients are defined by the output of backpropagation as it is implemented in Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify a version number. |
| Experiment Setup | Yes | In Algorithm 1, we use ρl = 0.95, ρu = 1.0, ql = 0.01, qu = 0.99, and ε = 0.075. ... In Algorithm 3, we use Adam with an initial step-size of 10-3, which gets reduced by a factor of 0.5 every 200 iterations, until an accuracy of ε = 10-2 is reached, or for at most ninit = 103 iterations. |