On gradient regularizers for MMD GANs
Authors: Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experiments show that it stabilizes and accelerates training, giving image generation models that outperform state-of-the art methods on 160 160 Celeb A and 64 64 unconditional Image Net. |
| Researcher Affiliation | Academia | Michael Arbel Gatsby Computational Neuroscience Unit University College London EMAIL Danica J. Sutherland Gatsby Computational Neuroscience Unit University College London EMAIL Mikołaj Bi nkowski Department of Mathematics Imperial College London EMAIL Arthur Gretton Gatsby Computational Neuroscience Unit University College London EMAIL |
| Pseudocode | No | The paper includes mathematical formulations and propositions but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for all of these experiments is available at github.com/Michael Arbel/Scaled-MMD-GAN. |
| Open Datasets | Yes | We evaluated unsupervised image generation on three datasets: CIFAR-10 [26] (60 000 images, 32 32), Celeb A [29] (202 599 face images, resized and cropped to 160 160 as in [7]), and the more challenging ILSVRC2012 (Image Net) dataset [41] (1 281 167 images, resized to 64 64). |
| Dataset Splits | No | The paper mentions using well-known datasets, but it does not provide explicit details about the specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits) for reproducibility. |
| Hardware Specification | No | The paper states that models were trained 'on a single GPU' or 'on 3 GPUs simultaneously' but does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We always used 64 samples per GPU from each of P and Q, and 5 critic updates per generator step. We used initial learning rates of 0.0001 for CIFAR-10 and Celeb A, 0.0002 for Image Net, and decayed these rates using the KID adaptive scheme of [7]: every 2 000 steps, generator samples are compared to those from 20 000 steps ago, and if the relative KID test [9] fails to show an improvement three consecutive times, the learning rate is decayed by 0.8. We used the Adam optimizer [25] with β1 = 0.5, β2 = 0.9. |