Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions

Authors: Xuxing Chen, Tesi Xiao, Krishnakumar Balasubramanian

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate the performance gain of the proposed algorithms over existing ones.
Researcher Affiliation Collaboration Xuxing Chen EMAIL Department of Mathematics University of California, Davis Tesi Xiao EMAIL Amazon Inc. Krishnakumar Balasubramanian EMAIL Department of Statistics University of California, Davis
Pseudocode Yes Algorithm 1: Moving-Average SOBA Algorithm 2: Multi-Objective Robust Moving-Average SOBA
Open Source Code No Our experiments for MA-SOBA are performed with the aid of the recently developed package Benchopt (Moreau et al., 2022) and the open-sourced bilevel optimization benchmark .
Open Datasets Yes We first compare the performance of MA-SOBA with other benchmark methods on two common tasks proposed in previous works (Ji et al., 2021; Hong et al., 2023; Dagréou et al., 2022), hyperparameter optimization for ℓ2 penalized logistic regression and data hyper-cleaning on the corrupted MNIST data set. To demonstrate the practical performance of MORMA-SOBA, we then conduct experiments in robust multi-task representation learning introduced in Gu et al. (2023) on the Fashion MNIST data set (Xiao et al., 2017).
Dataset Splits Yes In the first task... |Dtrain| = 49, 990, |Dval| = 91, 701, and d = 22. In the second task... The data set is partitioned into a training set Dtrain, a validation set Dval, and a test set Dtest, where |Dtrain| = 20, 000, |Dval| = 5, 000, and |Dtest| = 10, 000. For each task i [10] above, we partition its data set into the training set Dtrain i , validation set Dval i , and test set Dtest i .
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No Our experiments for MA-SOBA are performed with the aid of the recently developed package Benchopt (Moreau et al., 2022) and the open-sourced bilevel optimization benchmark. [...] The implementation of MORBi T follows the same manner described in Gu et al. (2023). ... SAGA updates (Defazio et al., 2014).
Experiment Setup Yes In all our experiments, we employ a batch size of 64 for all methods... For methods involving an inner loop (stoc Bi O, BSA, Am IGO), we perform 10 inner steps per each outer iteration... For methods that involve Neumann approximation for Hessian-vector product... we perform 10 steps of the subroutine per outer iteration. ...The step sizes and momentum parameters used in all benchmark algorithms are directly adopted from the fine-tuned parameters provided by Dagréou et al. (2022). From a grid search, we select the best constant step sizes for MO-SOBA. Following a grid search, we have selected the parameters in MA-SOBA as αkτ = 0.02, βk = γk = 0.01, and θk = 0.1. Following a grid search, we have selected the parameters in MA-SOBA as αkτ = 103, βk = γk = 10 2, and θk = 10 1. For the implementation of MORMA-SOBA, the regularization parameter µλ in 10 is set to be 0.01. All remaining parameters are chosen as constant values, as listed below: Outer variable: τx = 1, αk = 0.02, Inner variable: βk = 0.02 Auxiliary variable: γk = 0.02 Simplex variable: τλ = 1, αk = 0.02 Average gradient: θk = 0.6 Both evaluated methods use batch sizes of 8 and 128 to compute gi for each inner step and fi for each outer iteration, respectively.