reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FedBEns: One-Shot Federated Learning based on Bayesian Ensemble

Authors: Jacopo Talpini, Marco Savi, Giovanni Neglia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on various datasets, demonstrating that the proposed method outperforms competing baselines that typically rely on unimodal approximations of the local losses. In Section 5 we describe the experimental setup, the benchmark approaches from the literature, and the considered datasets. In Section 6 we provide and discuss the numerical results.
Researcher Affiliation	Academia	1Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Milan, Italy; 2Inria, Universit e Cˆote d Azur, Sophia Antipolis, France.
Pseudocode	Yes	Algorithm 1 Fed BEns: One-Shot FL through Federated Bayesian Ensemble
Open Source Code	Yes	Our code is available at: https:// github.com/jacopot96/Fed BEns
Open Datasets	Yes	Fashion MNIST (Xiao et al., 2017). SVHN (Netzer et al., 2011). CIFAR10 (Krizhevsky et al., 2010).
Dataset Splits	Yes	To simulate data heterogeneity among C client datasets, we partitioned the original image dataset into C subsets using a symmetric Dirichlet sampling procedure with parameter α (Hsu et al., 2019), where a smaller value of α results in a more heterogeneous data split. In addition, data are normalized before splitting, and a small fraction of the training data (i.e., 500 samples) is kept at the server as validation data for the hyperparameter tuning of the various approaches.
Hardware Specification	Yes	All the experiments were performed on a machine equipped with an Intel Xeon 4114 CPU and NVIDIA Titan XP GPU with 12Gb of RAM.
Software Dependencies	No	We kept the basic training procedure fixed for all the approaches and experiments to compare them fairly. SGD is utilized as local optimizer... For the various parameters of each baseline, we used their default settings and the official implementations provided by the authors. Concerning Fed BEns, we used the standard Py Torch weights initializer for the random initialization of the local ensemble. To perform Laplace approximation of the local posteriors, we exploited the open-source laplace package (Daxberger et al., 2021a), an easy-to-use software library for Py Torch offering access to the most common LA methods, discussed in Section 4.3. The server performs 300 steps to minimize the global loss using the Adam optimizer with its standard default hyperparameters.
Experiment Setup	Yes	SGD is utilized as local optimizer, with each client training for 20 epochs on Le Net and 50 epochs on the more complex CNNs for SVHN and CIFAR10. Moreover, we set the batch size to 64, the learning rate ηc to 0.01, and the momentum to 0.9, in line with (Jhunjhunwala et al., 2024). For the various parameters of each baseline, we used their default settings and the official implementations provided by the authors. [...] The server performs 300 steps to minimize the global loss using the Adam optimizer with its standard default hyperparameters. During each server s optimizer run, conducted separately for each ensemble model, the validation performance is evaluated every 30 steps and the parameters configuration that achieves the best validation performance is selected as the final component of the ensemble. Last, for all the experiments we employed a (cold) tempered posterior for each client, with T = 0.1, and a diagonal Gaussian as prior with variance σ2 = 0.1.