reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

Authors: Xiaoli Tang, Han Yu, Zengxiang Li, Xiaoxiao Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, MBOS-AFL achieves 12.28% higher utility, 14.52% more data acquired through auctions for a given budget, and 1.23% higher test accuracy achieved by the resulting FL model compared to the best baseline.
Researcher Affiliation	Academia	1College of Computing and Data Science, Nanyang Technological University, Singapore 2Department of Electrical and Computer Engineering, The University of British Columbia, Canada 3Vector Institute, Canada. Correspondence to: Han Yu <EMAIL>.
Pseudocode	Yes	Algorithm 1 The training procedure of MBOS-AFL
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	The performance assessment of MBOS-AFL is conducted on the following six widely-adopted datasets in FL studies: 1) MNIST3, 2) CIFAR-104, 3) Fashion MNIST (i.e., FMNIST) (Xiao et al., 2017), 4) EMNISTdigits (i.e., EMNISTD), 5) EMNIST-letters (i.e., EMNISTL) (Cohen et al., 2017) and 6) Kuzushiji-MNIST (i.e., KMNIST) (Clanuwat et al., 2018). 3http://yann.lecun.com/exdb/mnist/ 4https://www.cs.toronto.edu/kriz/cifar.html
Dataset Splits	Yes	Experiment Scenarios: We compare MBOS-AFL with baselines under two main experiment scenarios with each containing 10,000 DOs: 1) IID data, varying dataset sizes, without noise: In this scenario, the sizes of datasets owned by various DOs are randomly generated, ranging from 500 to 5,000 samples. [...] 2) Non IID data, with noise: In this experimental scenario, we deliberately introduce data heterogeneity by adjusting the class distribution among individual DOs. Following (Tang et al., 2024b), we implement the following Non-IID setup. [...] In this experiment scenario, each DO holds 3,000 images.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper mentions using Deep Q-Network (DQN) technique and neural networks, and RMSprop for training, but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	The replay buffer D of both the Inter BPA and the Intra BMA are set to 5,000. During training, both agents explore the environment using an ϵ-greedy policy with an annealing rate from 1.0 to 0.05. In updating both Qintra and Qinter, 64 tuples uniformly sampled from D are used for each training step, and the corresponding target networks are updated once every 20 steps. In our experiments, we use RMSprop with a learning rate of 0.0005 to train all neural networks, and set the discount factor γ to 1. In addition, we have set the number of candidate DOs within each session to 200 (i.e., Cs = 200). The communication round in each session is set at 100, while the local training epoch is set at 30.