reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Personalised Federated Learning On Heterogeneous Feature Spaces

Authors: Alain Rakotomamonjy, Maxime Vono, Hamlet Jesse Medina Ruiz, Liva Ralaivola

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For numerically validating the benefits associated to the proposed methodology FLIC, we consider toy problems with different characteristics of heterogeneity; as well as experiments on real data, namely (i) a digit classification problem from images of different sizes, (ii) an object classification problem from either images or text captioning on clients, and (iii) a Brain-Computer Interfaces problem. Code for reproducing part of the experiments is available at https://github.com/arakotom/flic.
Researcher Affiliation	Industry	Alain Rakotomamonjy EMAIL Criteo AI Lab Paris, France Maxime Vono EMAIL Criteo AI Lab Paris, France Hamlet Jesse Medina Ruiz EMAIL Criteo AI Lab Paris, France Liva Ralaivola EMAIL Criteo AI Lab Paris, France
Pseudocode	Yes	In Algorithm 1, we detail the pseudo-code associated to a specific instance of the proposed methodology when Fed Rep is resorted to learn model parameters {θi}i [b] under the FL paradigm. Algorithm 1 FLIC for Fed Rep
Open Source Code	Yes	Code for reproducing part of the experiments is available at https://github.com/arakotom/flic.
Open Datasets	Yes	The second problem involves digit classification using MNIST and USPS datasets, with dimensions of 28 28 and 16 16, respectively. The third experiment addresses a multimodal problem using a subset of the Text Caps dataset (Sidorov et al., 2020), an image captioning dataset. Finally, the fourth problem is a real medical problem denoted as Brain-Computer Interface (BCI) which consists in classifying mental imagery EEG datasets into five classes. The datasets we have considered is based on six datasets from the mental imagery MOABB data repository (Jayaram & Barachant, 2018)
Dataset Splits	Yes	We use the natural train/test split of those datasets and randomly share them accross clients. At each run, those pairs are separated in 80% train and 20% test sets. For each subject, we select the predefined train/test splits or used 75% of the trials for training and the remaining 25% for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	For training, all methods use Adam with a default learning rate of 0.001 and a batch size of 100. Other hyperparameters have been set as follows. Unless specified, the regularization strength λ1 and λ2 have been fixed to 0.001. Local sample batch size is set to 100 and the participation rate r to 0.1. For all experiments, we have set the number of communication round T to 50 and the number of local epochs to respectively 10 and 100 for the real-world and toy datasets. For FLIC, as in Fed Rep those local epochs is followed by one epoch for representation learning. We have trained the local embedding functions for 100 local epochs and a batch size of 10 for toy datasets and Text Caps and while of 100 for MNIST-USPS and BCI.
Experiment Setup	Yes	For all experiments, we consider T = 50 communication rounds for all algorithms; and at each round, a client participation rate of r = 0.1. The number of local epochs for training has been set to M = 10. As optimisers, we have used an Adam optimiser with a learning rate of 0.001 for all problems and approaches. Further details are given in Section S3.3 in the supplement. For each component of the latent anchor distribution, we consider a Gaussian with learnable mean vectors and fixed Identity covariance matrix. As such, the Wasserstein barycenter computation boils down to simply average the mean of client updates and for computing the third term in (3), we just sample from the Gaussian distribution. Accuracies are computed as the average accuracy over all clients after the last epoch in which all local models are trained.