Efficiently Escaping Saddle Points in Bilevel Optimization
Authors: Minhui Huang, Xuxing Chen, Kaiyi Ji, Shiqian Ma, Lifeng Lai
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present the experimental results to demonstrate the efficiency of our algorithm. We reformulate the problem in Du et al. (2017) as a bilevel optimization problem (1.1) and then compare our Algorithm 1 with AID-Bi O in Ji et al. (2021). More precisely, we consider the following bilevel optimization problem: min x Rd Φ(x) := f1(x, y (x)), s.t. y (x) = argmin y R f2(x, y). (5.1) Motivated by Du et al. (2017), we construct the following functions. For the upper level function we have [...] In Figure 1 we plot the learning curves of Φ(x) min Φ(x) vs. Iteration number. Our algorithm is denoted as PBO , and AID-Bi O is denoted as BO in Figure 1. Note that each learning curve is nearly a step function which consists of vertical and horizontal line segments. The horizontal segment indicates that the function value does not change and thus we may deduce that the iterates are stuck at a saddle point. Each vertical segment indicates that a perturbation successfully helps the iterate escape the saddle point. We observe that under different parameter choices our Algorithm 1 escapes saddle points more efficiently than standard bilevel optimization algorithm. |
| Researcher Affiliation | Academia | Minhui Huang EMAIL Department of Electrical and Computer Engineering University of California, Davis, CA 95616, USA Xuxing Chen EMAIL Department of Mathematics University of California, Davis, CA 95616, USA Kaiyi Ji EMAIL Department of Computer Science and Engineering University at Buffalo, Buffalo, NY 14260, USA Shiqian Ma EMAIL Department of Computational Applied Mathematics and Operations Research Rice University, Houston, TX 77005, USA Lifeng Lai EMAIL Department of Electrical and Computer Engineering University of California, Davis, CA 95616, USA |
| Pseudocode | Yes | Algorithm 1 Perturbed Algorithms for Minimax and Bilevel Optimization Problems; Algorithm 2 Inexact NEgative-curvature-Originated-from-Noise Algorithm (i NEON); Algorithm 3 Stoc Bi O with i NEON |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide links to a code repository. |
| Open Datasets | No | In the deterministic case (Section 5.1), the authors construct functions based on Du et al. (2017) rather than using a pre-existing dataset. For the stochastic case (Section 5.2), the paper states: "We first generate U in which every entry is sampled from the normal distribution N(0, 1/d), and then generate m matrices {Ai : i = 1, ..., m} where each entry is sampled from standard normal distribution N(0, 1)." This describes synthetic data generation, not the use of an open, pre-existing dataset. |
| Dataset Splits | No | The paper does not specify training/test/validation splits. For the deterministic case, it constructs functions. For the stochastic case, it describes generating synthetic data and parameters for that generation, but not how that generated data is partitioned into splits for experimentation. |
| Hardware Specification | No | The paper conducts numerical experiments but does not provide any specific details about the hardware used, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or their version numbers. |
| Experiment Setup | Yes | In our experiments, we choose the total number of iterations to be 1000 and all stepsizes to be 0.05 in both Algorithm 1 and AID-Bi O. Following Du et al. (2017), we conduct the comparison using different choices of problem parameters. [...] For both algorithms, we set d = 50, r = 5, m = 2000, K = 200, D = 5, α = 0.1, β = 0.03 and the batch-size to be 200. For Algorithm 3, we set T = 5, r = 0.5, ϵ = 0.03, ρφ = 0.001, τ = 0.03, η = 0.1, F = 0.001. |