reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

First-Order Federated Bilevel Learning

Authors: Yifan Yang, Peiyao Xiao, Shiqian Ma, Kaiyi Ji

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm. [...] We first conduct an experiment on federated data cleaning to compare the performance of our proposed Mem FBO algorithms with multiple benchmark personalized federated learning algorithms [...] We then perform a federated hyper-representation experiment to compare the performance of our method with other FBO algorithms [...]. The former experiment tests the robustness of different methods on the CIFAR10 dataset with CNN backbones [...]. The latter experiment tests the communication and memory efficiency on MNIST with MLP backbones [...]. The results of test accuracy of different (personalized) federated learning methods under different noisy levels are shown in Table 1.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, University at Buffalo 2Department of Computational Applied Math and Operations Research, Rice University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Mem FBO
Open Source Code	No	The paper does not contain an explicit statement about the release of open-source code or a link to a code repository.
Open Datasets	Yes	The former experiment tests the robustness of different methods on the CIFAR10 dataset with CNN backbones, following the same experimental setup as in (Collins et al. 2021). The latter experiment tests the communication and memory efficiency on MNIST with MLP backbones, following the experiment setup in (Tarzanagh et al. 2022).
Dataset Splits	Yes	For the baselines, we use 100 clients with the CIFAR10 dataset split into 500 training and 100 test samples per client. To simulate noise, the 500 training samples are divided into 450 noisy samples and 50 clean samples. A subset of the 450 samples is corrupted based on a noise rate (proportion of corrupted data) and a flip rate. In this experiment, the noise rate is set as 0%, 30%, 50%, and 70%. The flip rate decides the portion of labels of a data sample, which are assigned to one of all labels randomly with equal probability. We fix the flip rate to be 80%. Finally, we set up the data heterogeneity following the same procedure as in Fed Rep (Collins et al. 2021), where each client is assigned with 2 classes.
Hardware Specification	No	The paper discusses 'edge devices like smartphones' in the context of the problem being solved, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments described in Section 6.
Software Dependencies	No	The paper mentions 'CNN backbones' and 'MLP backbones' but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions) that were used for implementation or experimentation.
Experiment Setup	Yes	For the baselines, we use 100 clients with the CIFAR10 dataset split into 500 training and 100 test samples per client. To simulate noise, the 500 training samples are divided into 450 noisy samples and 50 clean samples. A subset of the 450 samples is corrupted based on a noise rate (proportion of corrupted data) and a flip rate. In this experiment, the noise rate is set as 0%, 30%, 50%, and 70%. The flip rate decides the portion of labels of a data sample, which are assigned to one of all labels randomly with equal probability. We fix the flip rate to be 80%. Finally, we set up the data heterogeneity following the same procedure as in Fed Rep (Collins et al. 2021), where each client is assigned with 2 classes.