Decentralized Bilevel Optimization: A Perspective from Transient Iteration Complexity

Authors: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments to validate the influence of convergence caused by communication topologies, data heterogeneity, moving average parameter θ, and finitedifference parameter δ. We also compare D-SOBA to other decentralized SBO algorithms and show the benefit of finite-difference Hessian/Jacobian-vector approximation of D-SOBA-FO in high-dimensional settings.
Researcher Affiliation Academia Boao Kong EMAIL Center for Data Science, Peking University Beijing, China Shuchen Zhu EMAIL Center for Data Science, Peking University Beijing, China Songtao Lu EMAIL Department of Computer Science and Engineering The Chinese University of Hong Kong Hong Kong SAR Xinmeng Huang EMAIL Graduate Group in Applied Mathematics and Computational Science University of Pennsylvania Philadelphia, USA Kun Yuan EMAIL Center for Machine Learning Research, Peking University AI for Science Institute Beijing, China
Pseudocode Yes Algorithm 1 D-SOBA-SO and D-SOBA-FO
Open Source Code No The paper does not contain any explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Here we consider a data hyper-cleaning problem (Shaban et al., 2019) for a Fashion MNIST dataset (Xiao et al., 2017) ... Consider a meta-learning problem (Finn et al., 2017)... The dataset is mini Image Net (Vinyals et al., 2016), which is generated from Image Net (Russakovsky et al., 2015) and consists of 100 classes with each class containing 600 images of size 84 x 84.
Dataset Splits Yes Then we split the 60000 training images into the training set and validation set, which contains 50000 images and 10000 images separately. ... We split these classes into 64 for training, 16 for validation, and 20 for testing. Each training and validation class splits its data by Dirichlet distribution with parameters α = 0.1 (Lin et al., 2021).
Hardware Specification No The paper mentions running experiments but does not provide specific details about the hardware used (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No In high-dimensional bilevel optimization, storing and computing the second-order matrices 2 12gi(x(t), y(t)) and 2 22gi(x(t), y(t)) is highly expensive... The Hessian/Jacobian vector product is computed via Py Torch s forward-mode automatic differentiation in D-SOBA-SO, which is significantly faster than first computing the full Hessian/Jacobian matrix and then multiplying it by the vector.
Experiment Setup Yes The step-sizes αt, βt, γt are initialized as 0.1 and multiplied by 0.8 per 1, 000 iterations while θt is set to 0.2. ... For all algorithms, the step-sizes are set to 0.1 and the batch size are set to 200. Meanwhile, the moving average parameters θt of D-SOBA-SO, D-SOBA-FO and MA-DSBO are set to 0.8. ... We use a four-layer CNN with four convolution blocks in which each block sequentially consists of a 3 x 3 convolution with 32 filters, batch normalizationm, Re LU activation, and 2 x 2 max pooling. ... The batch size is set to 32. The parameters of the last linear layer are set to be task-specific and the other parameters are global. We choose the stepsize 0.002 for upper-level iterations and 0.01 for lower-level iterations for all algorithms. The stepsize of auxiliary variables in D-SOBA-FO is 0.01 and the term δt in D-SOBA-FO is 0.001.