reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Authors: Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees. We validate the effectiveness of our methods through experiments on regularization selection, data hyper-cleaning, and coreset selection for continual learning.
Researcher Affiliation	Collaboration	1University at Buffalo, 2Meta, 3 Rice University 1EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Double-loop Tuning-Free Bilevel Optimizer (D-TFBO) Algorithm 2 Single-loop Tuning-Free Bilevel Optimizer (S-TFBO)
Open Source Code	No	Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively. Please refer to Appendix B for more details about practical implementation, experiment configurations, and additional plots.
Open Datasets	Yes	We compare our proposed algorithm with benchmark bilevel algorithms including Am IGO (Arbel & Mairal, 2022), BSA (Ghadimi & Wang, 2018), FSLA (Li et al., 2022), MRBO (Yang et al., 2021), SOBA (Dagréou et al., 2022), Stoc Bi O (Ji et al., 2021), SUSTAIN (Khanduri et al., 2021), TTSA (Hong et al., 2023b), VRBO (Yang et al., 2021) on the Covtype dataset. We conduct experiments on the MNIST dataset, where we aim to learn a set of weights λ, one for each training sample, in addition to the model parameters θ. Following Zhou et al. (2022), we use the Split CIFAR100 dataset and conduct experiments in the balanced and imbalanced scenarios.
Dataset Splits	No	The training set ST = {(dtrain i , ytrain i )}1 i n have been corrupted in this scenario... training set ST = {(dtrain i , ytrain i )}1 i n, while the outer objective aims to determine the best regularization term λ on the validation set SV = {(dval j , yval j )}1 j m. B.3 CONFIGURATION We adopt the default configuration for regularization selection and data hyper-cleaning. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are mentioned in the paper.
Software Dependencies	No	Our implementation is based on the benchmark provided in Dagréou et al. (2022) and Hao et al. (2024), respectively.
Experiment Setup	Yes	The batch size is 64. The maximum iterations are 2048 and 512, respectively. The data corruption ratio in hyper-cleaning is 0.1. For coreset selection, we also use the default configuration except for the leaning rates, due to the tuning-free design. The α0, β0, and γ0 values are set to 5.