Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Learning to Optimize: A Primer and A Benchmark
Authors: Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, Wotao Yin
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization. We set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges. We benchmarked many existing L2O approaches on a few representative optimization problems. For reproducible research and fair benchmarking purposes, we released our software implementation and data in the package Open-L2O at https://github.com/VITA-Group/Open-L2O. Keywords: learning to optimize, meta learning, optimization, algorithm unrolling |
| Researcher Affiliation | Collaboration | Tianlong Chen, Xiaohan Chen, Wuyang Chen, Zhangyang Wang EMAIL Department of Electrical and Computer and Engineering The University of Texas at Austin, Austin, TX 78712, USA Howard Heaton EMAIL Typal Research, Typal LLC, Los Angeles, CA 90064, USA Jialin Liu, Wotao Yin EMAIL Alibaba US, Damo Academy, Decision Intelligence Lab, Bellevue, WA 98004, USA |
| Pseudocode | No | The paper describes various algorithms and update rules using mathematical equations (e.g., Eq. 2, 10a, 12, 13, 17, 18) and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks with structured, step-by-step formatting. |
| Open Source Code | Yes | For reproducible research and fair benchmarking purposes, we released our software implementation and data in the package Open-L2O at https://github.com/VITA-Group/Open-L2O. |
| Open Datasets | Yes | We generate 51,200 samples as the training set and 1,024 pairs as the validation and testing sets, following the i.i.d. sampling procedure in [43]. We sample sparse vectors x q with components drawn i.i.d. drawn from the distribution Ber(0.1) N(0, 1), yielding an average sparsity of 10%. We run numerical experiments in four settings: [...] A Gaussian random matrix is highly incoherent, making sparse recovery relatively easy. To increase the challenge, we compute a dictionary D R256 512 from 400 natural images in the BSD500 data set [144] using the block proximal gradient method [239] and, then, use it as the measurement matrix A, which has a high coherence. [...] train L2O optimizers on the same neural network used in [13]: a simple Multi-layer Perceptron (MLP) with one 20-dimension hidden layer and the sigmoid activation function, trained on the MNIST data set to minimize the cross-entropy loss. |
| Dataset Splits | Yes | We generate 51,200 samples as the training set and 1,024 pairs as the validation and testing sets, following the i.i.d. sampling procedure in [43]. We sample sparse vectors x q with components drawn i.i.d. drawn from the distribution Ber(0.1) N(0, 1), yielding an average sparsity of 10%. [...] We sample 12,800 pairs of x q and bq for training and 1,280 pairs for validation and testing. The samples are noiseless. [...] All L2O optimizers are trained with the single model from 10,000 different random initializations drawn from N(0, 01). On each optimizee, now corresponding to a random initialization, the optimizers run for 100 iterations. The training uses a batch size of 128. During each testing run, we evaluate learned optimizers on an unseen testing optimizee for 10, 000 steps, which is much more than the training iteration number. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory amounts, or detailed computer specifications) used for running its experiments. It mentions 'memory bottleneck' in the context of scalability but not specific hardware. |
| Software Dependencies | No | For each problem, we choose applicable approaches that include both model-free and/or model-based ones, implement them in the Tensor Flow framework, ensure identical training/testing data, and evaluate them in the same but problem-specific metrics. |
| Experiment Setup | Yes | All our model-based L2O approaches take measurements bq as input and return estimates ˆxq x q. They are trained to minimize the mean squared error Eq Q ˆxq x q 2 2. We adopt a progressive training scheme following [31; 43; 231]. We use a batch size of 128 for training and a learning rate of 5 10 4. Other hyperparameters follow the default suggestions in their original papers. [...] We set 1,000 iterations for both model-free L2O methods and classic optimizers. [...] Four traditional gradient descent algorithms, including ADAM with a 10 1 step size, RMSProp with a 3 10 1 step size, GD with the line-searched step size started from 10 1, and NAG (Nesterov Accelerated Gradient) with the line-searched step size started from 10 1. All other hyperparamters are tuned by careful grid search. [...] All L2O optimizers are trained for 100 epochs and 1000 iterations per epoch. At the testing stage, we evaluate and report the logarithmic loss of unseen functions from the same family, which are plotted in Figure 7. |