Generalized Smooth Bilevel Optimization with Nonconvex Lower-Level
Authors: Siqi Zhang, Xing Huang, Feihu Huang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Some experimental results on hyper-parameter learning and meta learning demonstrate efficiency of our proposed methods. (Section 6. Numerical Experiments) |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China. Correspondence to: Feihu Huang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 PNGBi O Algorithm Input: Iteration number T, initialization x0, y0, θ0, learning rates ηt, αt, βt, proximal parameter γ > 0, penalty parameter ct > 0; Output: x T , y T. Algorithm 2 S-PNGBi O Algorithm Input: Iteration number T, initialization x0, y0, θ0, learning rates ηt, αt, βt, proximal parameter γ > 0, penalty parameter ct > 0 and mini-batch size Bt 1; Output: x T , y T |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It does not mention releasing code, providing a repository link, or including code in supplementary materials. |
| Open Datasets | Yes | In this experiment, we conduct the data hyper-cleaning task (Franceschi et al., 2017; Shen & Chen, 2023) on the on MNIST (Deng, 2012) and Fashion MNIST (Xiao et al., 2017) datasets, respectively. ... We use Resnet-18 (He et al., 2016) as task-shared model at the UL problem, and use a 2-layer neural network as task-specific model at the LL problem. ... Dtr i and Dval i randomly sample disjoint categories from the CIFAR10 dataset (Krizhevsky et al., 2009), respectively. |
| Dataset Splits | Yes | In the experiment, the dataset is partitioned into a training set, a validation set, and a test set at a ratio of 1:1:2. |
| Hardware Specification | No | The paper only mentions 'CPU time' but does not provide specific details about the CPU model, GPU, memory, or any other detailed hardware specifications used for experiments. (Section 6.3 Meta Learning) |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For our algorithm, we set ηt = 0.01, αt = 0.013 (t+1)0.8 , βt = 0.011 (t+1)0.8 in MNIST dataset and ηt = 0.01, αt = 0.013 (t+1)0.5 , βt = 0.011 (t+1)0.5 in Fashion MNIST dataset. The learning rate settings of other algorithms are shown in the following Table 2 and Table 3. |