reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization

Authors: Youran Dong, Junfeng Yang, Wei Yao, Jin Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to the theoretical speedup, numerical experiments demonstrate the significant practical performance benefits of incorporating curvature information. ... In this section, we present experiments to evaluate the practical performance of the proposed NBO framework.
Researcher Affiliation	Academia	1School of Mathematics, Nanjing University, Nanjing, China 2National Center for Applied Mathematics Shenzhen, Southern University of Science and Technology, Shenzhen, China 3Department of Mathematics, Southern University of Science and Technology, Shenzhen, China 4Detection Institute for Advanced Technology Longhua-Shenzhen (DIATLHSZ), Shenzhen, China. Correspondence to: Jin Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Newton-based framework for Bilevel Optimization (NBO) ... Algorithm 2 NBO-GD ... Algorithm 3 GD(xk, yk, uk; T) ... Algorithm 4 NSBO-SGD ... Algorithm 5 SGD(xk, yk, uk; T)
Open Source Code	No	The paper mentions using "Bilevel Optimization Benchmark framework (Dagr eou et al., 2022) and the Benchopt library (Moreau et al., 2022)" for experiments, but there is no explicit statement or link indicating that the authors' own implementation code for the proposed NBO framework is open-source or available.
Open Datasets	Yes	We conduct experiments on two datasets: IJCNN1 and Covtype. ... We also conduct data hyper-cleaning experiments ... on two datasets: MNIST and Fashion MNIST (Xiao et al., 2017). ... 1https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html 2https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_covtype.html 3http://yann.lecun.com/exdb/mnist/ 4https://github.com/zalandoresearch/fashion-mnist
Dataset Splits	Yes	For the synthetic dataset, we use 16,000 training samples and 4,000 validation samples. ... For IJCNN1, we employ 49,990 training samples and 91,701 validation samples. For Covtype, we utilize 371,847 training samples, 92,962 validation samples, and 116,203 testing samples. For MNIST3 and Fashion MNIST4, we use 20,000 training samples, 5,000 validation samples, and 10,000 testing samples
Hardware Specification	Yes	All experiments were performed on a system equipped with an Intel(R) Xeon(R) Gold 5218R CPU running at 2.10 GHz and an NVIDIA H100 GPU with 80 GB of memory.
Software Dependencies	No	In terms of computational frameworks, we use JAX (Bradbury et al., 2018) for MNIST, Fashion MNIST, and Covtype. For IJCNN1, we use Numba (Lam et al., 2015), as it demonstrates faster performance compared to JAX for this dataset. The paper mentions JAX and Numba by citation, but does not provide specific version numbers for these or other software dependencies used in their experiments.
Experiment Setup	Yes	For both Am IGO and NBO, the outer step size is set to 1, and the inner step size is set to 0.03. ... The batch size for all algorithms is set to 64, except for NSBO-SGD and SHINE. For NSBO-SGD, since the size of Bk 2 is significantly larger than that of other batches in theory, we set \|Bk 2\| = 256, while the other batches remain at 64. ... The step sizes are tuned via grid search... IJCNN1: The inner step size is chosen from 6 values between 2^-5 and 1, spaced on a logarithmic scale. The outer ratio is chosen in 10^-2, 10^-1.5, 10^-1, 10^-0.5, 1.