reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rewarding the Rare: Maverick-Aware Shapley Valuation in Federated Learning

Authors: Mengwei Yang, Baturalp Buyukates, Athina Markopoulou

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on benchmark datasets demonstrate that Fed MS improves model performance and better recognizes valuable client contributions, even under scenarios involving adversaries, free-riders, and skewed or rare-class distributions.
Researcher Affiliation	Academia	Mengwei Yang EMAIL Department of Electrical Engineering and Computer Science University of California, Irvine Baturalp Buyukates EMAIL School of Computer Science University of Birmingham Athina Markopoulou EMAIL Department of Electrical Engineering and Computer Science University of California, Irvine
Pseudocode	Yes	Algorithm 1: Maverick-Shapley GTG (MS-GTG) ... Algorithm 2: Fed MS: a Maverick-Shapley Client Selection Mechanism for FL ... Algorithm 3: Maverick-Shapley MR (MS-MR) ... Algorithm 4: Maverick-Shapley TMR (MS-TMR)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	Datasets. We use five benchmark datasets, (1) MNIST (Deng, 2012) consisting of handwritten digits, with 60,000 samples for training and 10,000 for testing, and (2) CIFAR-10 (Krizhevsky et al., 2009) consisting of colored images of 10 classes, with 50,000 samples for training and 10,000 for testing. Another 3 datasets are from Med MNIST (Yang et al., 2021; 2023), a large-scale MNIST-like collection of standardized biomedical images. (3) Blood MNIST (Acevedo et al., 2020) including blood cell microscope data, with 11,959 samples for training, 1,712 samples for validation and 3,421 for testing. (4) Organ AMNIST (Bilic et al., 2023; Xu et al., 2019) including abdominal CT data, with 34,561 samples for training, 6,491 samples for validation and 17,778 samples for testing. (5) Path MNIST (Kather et al., 2019) consisting of colon pathology data, with 89,996 samples for training, 10,004 samples for validation and 7,180 samples for testing.
Dataset Splits	Yes	For both MNIST and CIFAR-10, we follow common practice in existing Shapley-based FL methods (MR, TMR, and GTG) by using a balanced server-side validation set. For both MNIST and CIFAR-10, we randomly split 20% of testing samples as validation dataset. In the Blood MNIST, Organ AMNIST, and Path MNIST datasets, we utilize the provided validation set which follows the imbalanced distribution of the training samples.
Hardware Specification	Yes	Our algorithm is implemented in Pytorch and we perform experiments on two NVIDIA RTX A5000 and two Xeon Silver 4316.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify a version number or other key software dependencies with versions.
Experiment Setup	Yes	FL Setup. We train 100 rounds for MNIST and 200 for CIFAR-10, both with a batch size of 64. We employ 5 local training rounds for MNIST and a single local training round for CIFAR-10. We train Blood MNIST for 200 rounds and Organ AMNIST for 100 rounds. Path MNIST is trained for 100 rounds under the Mavericks + Dir(10) and Mavericks + Dir(1) settings, and 200 rounds under the Mavericks + Dir(0.1) setting. Under the Blood MNIST, Organ AMNIST, and Path MNIST datasets, we have one local training round and use a batch size of 64. The learning rate is 0.05 for all baselines in all datasets. Our algorithm is implemented in Pytorch and we perform experiments on two NVIDIA RTX A5000 and two Xeon Silver 4316. ... For our proposed Fed MS, we choose temperature T = 0.01, α = 0.8, and error threshold ϵb = 0.01, ϵi = 0.001.