Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis
Authors: Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees. We validate the effectiveness of our methods through experiments on regularization selection, data hyper-cleaning, and coreset selection for continual learning. |
| Researcher Affiliation | Collaboration | 1University at Buffalo, 2Meta, 3 Rice University 1EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Double-loop Tuning-Free Bilevel Optimizer (D-TFBO) Algorithm 2 Single-loop Tuning-Free Bilevel Optimizer (S-TFBO) |
| Open Source Code | No | Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively. Please refer to Appendix B for more details about practical implementation, experiment configurations, and additional plots. |
| Open Datasets | Yes | We compare our proposed algorithm with benchmark bilevel algorithms including Am IGO (Arbel & Mairal, 2022), BSA (Ghadimi & Wang, 2018), FSLA (Li et al., 2022), MRBO (Yang et al., 2021), SOBA (Dagréou et al., 2022), Stoc Bi O (Ji et al., 2021), SUSTAIN (Khanduri et al., 2021), TTSA (Hong et al., 2023b), VRBO (Yang et al., 2021) on the Covtype dataset. We conduct experiments on the MNIST dataset, where we aim to learn a set of weights λ, one for each training sample, in addition to the model parameters θ. Following Zhou et al. (2022), we use the Split CIFAR100 dataset and conduct experiments in the balanced and imbalanced scenarios. |
| Dataset Splits | No | The training set ST = {(dtrain i , ytrain i )}1 i n have been corrupted in this scenario... training set ST = {(dtrain i , ytrain i )}1 i n, while the outer objective aims to determine the best regularization term λ on the validation set SV = {(dval j , yval j )}1 j m. B.3 CONFIGURATION We adopt the default configuration for regularization selection and data hyper-cleaning. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are mentioned in the paper. |
| Software Dependencies | No | Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively. |
| Experiment Setup | Yes | The batch size is 64. The maximum iterations are 2048 and 512, respectively. The data corruption ratio in hyper-cleaning is 0.1. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design. The α0, β0, and γ0 values are set to 5. |