Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization

Authors: Youran Dong, Junfeng Yang, Wei Yao, Jin Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to the theoretical speedup, numerical experiments demonstrate the significant practical performance benefits of incorporating curvature information. ... In this section, we present experiments to evaluate the practical performance of the proposed NBO framework.
Researcher Affiliation Academia 1School of Mathematics, Nanjing University, Nanjing, China 2National Center for Applied Mathematics Shenzhen, Southern University of Science and Technology, Shenzhen, China 3Department of Mathematics, Southern University of Science and Technology, Shenzhen, China 4Detection Institute for Advanced Technology Longhua-Shenzhen (DIATLHSZ), Shenzhen, China. Correspondence to: Jin Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 Newton-based framework for Bilevel Optimization (NBO) ... Algorithm 2 NBO-GD ... Algorithm 3 GD(xk, yk, uk; T) ... Algorithm 4 NSBO-SGD ... Algorithm 5 SGD(xk, yk, uk; T)
Open Source Code No The paper mentions using "Bilevel Optimization Benchmark framework (Dagr eou et al., 2022) and the Benchopt library (Moreau et al., 2022)" for experiments, but there is no explicit statement or link indicating that the authors' own implementation code for the proposed NBO framework is open-source or available.
Open Datasets Yes We conduct experiments on two datasets: IJCNN1 and Covtype. ... We also conduct data hyper-cleaning experiments ... on two datasets: MNIST and Fashion MNIST (Xiao et al., 2017). ... 1https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html 2https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_covtype.html 3http://yann.lecun.com/exdb/mnist/ 4https://github.com/zalandoresearch/fashion-mnist
Dataset Splits Yes For the synthetic dataset, we use 16,000 training samples and 4,000 validation samples. ... For IJCNN1, we employ 49,990 training samples and 91,701 validation samples. For Covtype, we utilize 371,847 training samples, 92,962 validation samples, and 116,203 testing samples. For MNIST3 and Fashion MNIST4, we use 20,000 training samples, 5,000 validation samples, and 10,000 testing samples
Hardware Specification Yes All experiments were performed on a system equipped with an Intel(R) Xeon(R) Gold 5218R CPU running at 2.10 GHz and an NVIDIA H100 GPU with 80 GB of memory.
Software Dependencies No In terms of computational frameworks, we use JAX (Bradbury et al., 2018) for MNIST, Fashion MNIST, and Covtype. For IJCNN1, we use Numba (Lam et al., 2015), as it demonstrates faster performance compared to JAX for this dataset. The paper mentions JAX and Numba by citation, but does not provide specific version numbers for these or other software dependencies used in their experiments.
Experiment Setup Yes For both Am IGO and NBO, the outer step size is set to 1, and the inner step size is set to 0.03. ... The batch size for all algorithms is set to 64, except for NSBO-SGD and SHINE. For NSBO-SGD, since the size of Bk 2 is significantly larger than that of other batches in theory, we set |Bk 2| = 256, while the other batches remain at 64. ... The step sizes are tuned via grid search... IJCNN1: The inner step size is chosen from 6 values between 2^-5 and 1, spaced on a logarithmic scale. The outer ratio is chosen in 10^-2, 10^-1.5, 10^-1, 10^-0.5, 1.