Distributed Extra-gradient with Optimal Complexity and Communication Guarantees
Authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher
ICLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs. |
| Researcher Affiliation | Academia | EMAIL EMAIL These authors contributed equally to this work. Department of Informatics, University of Oslo. Work performed at EPFL (LIONS) and Aalborg University. Laboratory for Information and Inference Systems (LIONS), EPFL. |
| Pseudocode | Yes | Algorithm 1 Q-Gen X: Loops are executed in parallel on processors. At certain steps, each processor computes sufficient statistics of a parametric distribution to estimate distribution of dual vectors. |
| Open Source Code | Yes | Open source code will be released at https://github.com/LIONS-EPFL/qgenx |
| Open Datasets | Yes | train a WGAN-GP (Arjovsky et al., 2017) on CIFAR10 (Krizhevsky, 2009). |
| Dataset Splits | No | The paper states it uses CIFAR10, which has standard splits, but does not explicitly provide the training/validation/test split percentages or sample counts in the text. |
| Hardware Specification | Yes | The experiments were performed on 3 Nvidia V100 GPUs (1 per node) using a Kubernetes cluster and an image built on the torch_cgx Docker image. |
| Software Dependencies | No | The paper mentions "torch_cgx pytorch extension", "Open MPI", and "Weights and Biases", but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | we share an effective batch size of 1024 across 3 nodes (strong scaling) connected via Ethernet, and use Layernorm (Ba et al., 2016) instead of Batchnorm (Ioffe & Szegedy, 2015). The results are shown in Figure 1 (left) showing evolution of FID. |