reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Federated Variational Inference: Towards Improved Personalization and Generalization

Authors: Elahe Vedadi, Joshua V. Dillon, Philip Andrew Mansfield, Karan Singhal, Arash Afkanpour, Warren Richard Morningstar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on FEMNIST and CIFAR-100 image classiﬁcation and show that Fed VI beats the state-of-the-art on both tasks. 5 Implementation and Experimental Evaluation
Researcher Affiliation	Industry	Elahe Vedadi EMAIL Google Research; Joshua V. Dillon EMAIL Google Research; Philip Andrew Mansﬁeld EMAIL Google Research; Karan Singhal EMAIL Google Research; Arash Afkanpour EMAIL Google Research; Warren Richard Morningstar EMAIL Google Research
Pseudocode	Yes	Algorithm 1 Fed VI Training
Open Source Code	No	The paper states, "We implement our Fed VI algorithm in Tensor Flow Federated (TFF)" but does not provide a specific link to the source code for their implementation or an explicit statement of its release.
Open Datasets	Yes	We evaluate Fed VI algorithm on two diﬀerent datasets, FEMNIST3 (Caldas et al., 2019) (62-class digit and character classiﬁcation) and CIFAR-1004 (Krizhevsky et al., 2009) (100-class classiﬁcation). 3https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/emnist/load_data 4https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/cifar100/load_data
Dataset Splits	Yes	For FEMNIST dataset with 3400 clients we consider the ﬁrst 20 clients as non-participating users which are held-out in training to better measure generalization as in (Yuan et al., 2022). At each round of training we select 100 clients uniformly at random without replacement, but with replacement across rounds. For CIFAR-100 with 500 training clients, we set the data of the ﬁrst 10 clients as held-out data and select 50 clients uniformly at randomly at each round. ... at each epoch we consider the ﬁrst 50% of each mini-batch as the support set and the other 50% as the query set (i.e, for a mini-batch with 256 data samples the ﬁrst 128 samples belong to the support set and the rest belong to query set).
Hardware Specification	Yes	We implement our Fed VI algorithm in Tensor Flow Federated (TFF) and scale up the implementation to NVIDIA Tesla V100 GPUs for hyperparameter tuning.
Software Dependencies	No	The paper mentions "Tensor Flow Federated (TFF)" as the implementation framework but does not provide specific version numbers for TFF or any other software libraries or dependencies.
Experiment Setup	Yes	We train Fed VI algorithm on both FEMNIST and CIFAR-100 for 1500 rounds and at each round of training we divide both datasets into mini-batches of 256 data samples and used mini-batch gradient descent algorithm to optimize the objective function. ... We use Stochastic Gradient Descent (SGD) for our client optimizer and SGD with momentum for the server optimizer for all experiments (Reddi et al., 2020). We set the client learning rate equal to 0.03 for CIFAR-100 and 0.02 for FEMNIST dataset, and server learning rate equal to 3.0 with momentum 0.9 for both FEMNIST and CIFAR-100 datasets. ... Fed VI results are reported for τ = 10^-9 for FEMNIST, and τ = 10^-3 for CIFAR-100.