Federated Stochastic Bilevel Optimization with Fully First-Order Gradients
Authors: Yihan Zhang, Rohit Dhaipule, Chiu C. Tan, Haibin Ling, Hongchang Gao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the extensive experimental results confirm the efficacy of our proposed algorithm. ... In this section, we evaluate the performance of our Algorithm 1 on the commonly used benchmark task: hyperparameter optimization and hyper-representation learning. |
| Researcher Affiliation | Academia | 1Temple University 2Stony Brook University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Fed SVRBGD-FO Input: x0, y0, z0, η > 0, αx > 0, αy > 0, αz > 0, βx > 0, βy > 0, βz > 0. ... |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In this experiment, we used three benchmark datasets: a9a, w8a, covtype, which are obtained from LIBSVM datasets 2. Footnote 2: https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ |
| Dataset Splits | Yes | Here, 10% of the samples are randomly selected as the test set. For the remaining samples, 70% of them are randomly selected as the training set and the others are used as the validation set. |
| Hardware Specification | Yes | Then, we run all experiments on a workstation with 4 NVIDIA A5000 GPU cards, each of which accommodates two threads to simulate eight workers. |
| Software Dependencies | No | The paper does not provide specific software versions for libraries or frameworks used (e.g., Python version, PyTorch version, etc.). |
| Experiment Setup | Yes | In our experiment, we set the solution accuracy ϵ to 0.1. Then, according to the theoretical results in [Tarzanagh et al., 2022; Huang et al., 2023; Gao, 2022; Li et al., 2024], the learning rate of Fed NEST and Fed MBO is set to ϵ2, while that of Local BSGVRM and Fed Bi OAcc is set to ϵ. Regarding our method, according to Theorem 4.8, we set the learning rate η to ϵ2, the coefficient αx = αy = αz = ϵ, and the penalty λ = 5/ϵ. Moreover, for all methods using the momentum-based variance reduction technique, we set the the coefficient of the momentum to 0.1. ... The batch size for each worker is set to 10 in this experiment. ... The communication period is set to 4 in this experiment. ... In our experiment, we used a fully connected two-layer neural network. Its dimensionality of the input, hidden, and output layer is 54, 30, and 7, respectively. The dataset is covtype, which has seven classes. The batch size is 100. The communication period p = 4. |