reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Generalization Result for Convergence in Learning-to-Optimize

Authors: Michael Sucker, Peter Ochs

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct two experiments to show the validity of our claims: We use a neural-network based iterative optimization algorithm to a) solve quadratic problems and b) to train another neural network. In both cases, the learned algorithm outperforms the baseline and converges to a critical point with high probability.
Researcher Affiliation	Academia	1Department of Mathematics, University of T ubingen, T ubingen, Germany 2Department of Mathematics and Computer Science, Saarland University, Saarbr ucken, Germany. Correspondence to: Michael Sucker <EMAIL>, Peter Ochs <EMAIL>.
Pseudocode	No	The paper describes algorithmic updates in text and through network architecture diagrams (Figure 3 and 4) but does not contain structured pseudocode or algorithm blocks with labeled steps.
Open Source Code	Yes	The code to reproduce the results can be found at https: //github.com/Michi Sucker/COLA_2024.
Open Datasets	No	For the quadratic problems, the strong-convexity and smoothness constants of ℓare sampled randomly in the intervals [m , m+], [L , L+] (0, + ), and we define the matrix Aj, j = 1, ..., N, as diagonal matrix with entries aj ii = mj + i( p Lj mj)/d, i = 1, ..., d. For training neural networks, we use functions gi as polynomials of degree d = 5, where we sample the coefficients (ci,0, ..., ci,5) uniformly in [ 5, 5]. Similarly, we sample the points {xi,j}K j=1 uniformly in [ 2, 2]. All datasets are generated synthetically or according to specified rules rather than being publicly available external datasets.
Dataset Splits	Yes	For the construction of the prior, we use Nprior = 500 functions, for the probabilistic constraint we use Nval = 500 functions, and for the PAC-Bayesian optimization step we use Ntrain = 250 functions, all of which are sampled i.i.d., that is, the data sets are independent of each other. The right plot shows the estimated probabilities ... on 250 test sets of size N = 250.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. It only mentions general experimental procedures.
Software Dependencies	No	As baseline we use Adam (Kingma & Ba, 2015) as it is implemented in Py Torch (Paszke et al., 2019). Although PyTorch is mentioned, a specific version number is not provided, which is necessary for reproducibility.
Experiment Setup	Yes	Thus, we restrict to ttrain = 500 iterations. Finally, we consider ξ to be converged, if the loss is smaller than 10 16. Adam (Kingma & Ba, 2015)... we tune its step-size with a simple grid search over 100 values in [10 4, 10 2], such that its performance is best for the given ttrain = 250 iterations. This yields the value κ = 0.008. We run this procedure for 150 103 iterations.