Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Authors: Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees. We validate the effectiveness of our methods through experiments on regularization selection, data hyper-cleaning, and coreset selection for continual learning.
Researcher Affiliation Collaboration 1University at Buffalo, 2Meta, 3 Rice University 1EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Double-loop Tuning-Free Bilevel Optimizer (D-TFBO) Algorithm 2 Single-loop Tuning-Free Bilevel Optimizer (S-TFBO)
Open Source Code No Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively. Please refer to Appendix B for more details about practical implementation, experiment configurations, and additional plots.
Open Datasets Yes We compare our proposed algorithm with benchmark bilevel algorithms including Am IGO (Arbel & Mairal, 2022), BSA (Ghadimi & Wang, 2018), FSLA (Li et al., 2022), MRBO (Yang et al., 2021), SOBA (Dagréou et al., 2022), Stoc Bi O (Ji et al., 2021), SUSTAIN (Khanduri et al., 2021), TTSA (Hong et al., 2023b), VRBO (Yang et al., 2021) on the Covtype dataset. We conduct experiments on the MNIST dataset, where we aim to learn a set of weights λ, one for each training sample, in addition to the model parameters θ. Following Zhou et al. (2022), we use the Split CIFAR100 dataset and conduct experiments in the balanced and imbalanced scenarios.
Dataset Splits No The training set ST = {(dtrain i , ytrain i )}1 i n have been corrupted in this scenario... training set ST = {(dtrain i , ytrain i )}1 i n, while the outer objective aims to determine the best regularization term λ on the validation set SV = {(dval j , yval j )}1 j m. B.3 CONFIGURATION We adopt the default configuration for regularization selection and data hyper-cleaning. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design.
Hardware Specification No No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are mentioned in the paper.
Software Dependencies No Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively.
Experiment Setup Yes The batch size is 64. The maximum iterations are 2048 and 512, respectively. The data corruption ratio in hyper-cleaning is 0.1. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design. The α0, β0, and γ0 values are set to 5.