Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

PaDPaF: Partial Disentanglement with Partially-Federated GANs

Authors: Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental evaluation corroborates our findings, and we also discuss a theoretical motivation for the proposed approach. In this section, we show some experiments to show the capabilities of the Pa DPa F model. First, we run a simplified version of the Pa DPa F model on a simple linear regression problem with data generated following Simpson s Paradox as shown in Fig. 3. Next, we run the main experiment on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009). We compare our method with Ditto (Li et al., 2021a) as well as Ditto with Fed Prox (Li et al., 2018). We further demonstrate that our method also works with variational auto-encoders (VAEs) (Kingma & Welling, 2013). Finally, we show our model s performance on Celeb A (Liu et al., 2015), mainly to show its abilities in generating and varying locally available attributes (i.e. styles) on data sharing the same content.
Researcher Affiliation Academia Abdulla Jasem Almansoori EMAIL Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi, UAE Samuel Horváth EMAIL Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi, UAE Martin Takáč EMAIL Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi, UAE
Pseudocode Yes A Algorithm In this section, we write the training algorithm in detail for completeness. The training is a straightforward application of Fed Avg on GANs without its private parameters. Algorithm 1 Contrastive GANs with Partial Fed Avg
Open Source Code Yes We make the code publicly available for reproducibility at https://github.com/zeligism/Pa DPa F.
Open Datasets Yes Next, we run the main experiment on MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009). ... Finally, we show our model s performance on Celeb A (Liu et al., 2015), mainly to show its abilities in generating and varying locally available attributes (i.e. styles) on data sharing the same content.
Dataset Splits Yes The MNIST train dataset is partitioned into 8 subsets assigned to 8 clients, and each client is handled by a unique worker. ... In the case of conditioning on y, we further restrict the datasets and drop 50% of the labels from each partitioned dataset. ... We prepare the clients data in the same way as the MNIST experiments but with slightly different data augmentations and 10 clients instead. ... The federated dataset is created by a partition based on attributes. Given 40 attributes and 10 clients, each client is given 4 unique attributes and only has access to data having any of these 4 attributes.
Hardware Specification Yes We run all experiments on a single NVIDIA A100 SXM GPU 40GB.
Software Dependencies No We generally use Adam (Kingma & Ba, 2014) for both the client s and the server s optimizer. ... We write the Py Torch code for each data augmentation:
Experiment Setup Yes We use Fed Avg (Mc Mahan et al., 2016) algorithm as a backbone. ... We found that choosing learning rates 0.01 and 0.001 for the server optimizer and the client optimizer, respectively, is a good starting point. We also use an exponential-decay learning rate schedule for both the server and the clients, with a decay rate of 0.99 for MNIST (0.98 for Celeb A) per communication round. ... For Ditto and Fed Prox, we choose the prox parameter to be equal to 1.0. ... For MNIST, we train the models for a half epoch in each round to further restrict the local convergence for each client. ... For this experiment, we introduce 10 clients with different data augmentations and we train them using Adam with a local learning rate of 0.0003 and a global learning rate of 0.003. ... We train our model for approximately 200 communication rounds, with 2 epochs per round, and train the discriminators 5 times as frequently as the generator (i.e. TD = 5).