reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Personalized Federated Learning of Probabilistic Models: A PAC-Bayesian Approach

Authors: Mahrokh Ghoddousi Boroujeni, Andreas Krause, Giancarlo Ferrari-Trecate

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PAC-PFL on Gaussian Process (GP) regression and Bayesian Neural Network (BNN) classification as representative examples of probabilistic models. Our experiments demonstrate that PAC-PFL yields accurate and well-calibrated predictions (c1), even in highly heterogeneous (c2) and data-poor (c3) scenarios.
Researcher Affiliation	Academia	Mahrokh G. Boroujeni EMAIL Institute of Mechanical Engineering EPFL, Switzerland Andreas Krause EMAIL Department of Computer Science ETH Zürich, Switzerland Giancarlo Ferrari-Trecate EMAIL Institute of Mechanical Engineering EPFL, Switzerland
Pseudocode	Yes	Algorithm 1 PAC-PFL executed by the server... Algorithm 2 Client_Update for client i with dataset Si... Algorithm 3 Differentially private PAC-PFL with 1 SVGD particle executed by the server
Open Source Code	Yes	The codebase for our algorithm is available on https://sites.google.com/view/pac-pfl. ... The source code for our PAC-PFL implementation using GP is accessible within the same Google Drive repository. Upon acceptance, we intend to make the source code for BNN publicly available. To facilitate the use of our software, we have incorporated a demonstration Jupyter Notebook in the source code repository.
Open Datasets	Yes	We employ the FEMNIST dataset, which is curated and maintained by the LEAF project (Caldas et al., 2019). ... The PV dataset can be accessed via the following link: https://drive.google. com/drive/folders/153Me Alnt N4VORHdg YQ3w G3Oyl W0Sl Bf9?usp=sharing. ... The EMNIST dataset is detailed in Appendix 8.5.
Dataset Splits	Yes	We utilize the original train-test split provided with the data, without any additional preprocessing. ... The first dataset comprises the initial two weeks of June 2018, which provides a total of 150 samples for each client. The second dataset encompasses the data from both June and July 2018, resulting in 610 training samples per client. For all experiments, the test dataset consists of the data from June and July 2019.
Hardware Specification	No	The paper does not explicitly state specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general computing environments like 'on a server' or 'on a cluster' implicitly.
Software Dependencies	No	The paper mentions the 'pvlib Python library (Holmgren et al., 2022)' and refers to other implementations, but does not provide specific version numbers for key software components or libraries (e.g., Python, PyTorch, TensorFlow, CUDA versions) used for its own experiments.
Experiment Setup	Yes	For all neural networks, we explore structures with the same number of neurons per layer. The number of neurons per layer can take values of 2n for n ∈ {1, ..., 6}, and we consider 2 or 4 hidden layers. For PAC-PFL, we employ 4 SVGD particles and set k = 4. The parameter β is set to the number of samples for each client, β = mi. ... In all PV experiments, we set the hyper-prior mean for the neural network weights and biases to 0 and the hyper-prior mean for the noise standard deviation to 0.4.