Generalized Smooth Bilevel Optimization with Nonconvex Lower-Level

Authors: Siqi Zhang, Xing Huang, Feihu Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Some experimental results on hyper-parameter learning and meta learning demonstrate efficiency of our proposed methods. (Section 6. Numerical Experiments)
Researcher Affiliation Academia 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China. Correspondence to: Feihu Huang <EMAIL>.
Pseudocode Yes Algorithm 1 PNGBi O Algorithm Input: Iteration number T, initialization x0, y0, θ0, learning rates ηt, αt, βt, proximal parameter γ > 0, penalty parameter ct > 0; Output: x T , y T. Algorithm 2 S-PNGBi O Algorithm Input: Iteration number T, initialization x0, y0, θ0, learning rates ηt, αt, βt, proximal parameter γ > 0, penalty parameter ct > 0 and mini-batch size Bt 1; Output: x T , y T
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It does not mention releasing code, providing a repository link, or including code in supplementary materials.
Open Datasets Yes In this experiment, we conduct the data hyper-cleaning task (Franceschi et al., 2017; Shen & Chen, 2023) on the on MNIST (Deng, 2012) and Fashion MNIST (Xiao et al., 2017) datasets, respectively. ... We use Resnet-18 (He et al., 2016) as task-shared model at the UL problem, and use a 2-layer neural network as task-specific model at the LL problem. ... Dtr i and Dval i randomly sample disjoint categories from the CIFAR10 dataset (Krizhevsky et al., 2009), respectively.
Dataset Splits Yes In the experiment, the dataset is partitioned into a training set, a validation set, and a test set at a ratio of 1:1:2.
Hardware Specification No The paper only mentions 'CPU time' but does not provide specific details about the CPU model, GPU, memory, or any other detailed hardware specifications used for experiments. (Section 6.3 Meta Learning)
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes For our algorithm, we set ηt = 0.01, αt = 0.013 (t+1)0.8 , βt = 0.011 (t+1)0.8 in MNIST dataset and ηt = 0.01, αt = 0.013 (t+1)0.5 , βt = 0.011 (t+1)0.5 in Fashion MNIST dataset. The learning rate settings of other algorithms are shown in the following Table 2 and Table 3.