Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games
Authors: Yi Feng, Kaito Fujii, Stratis Skoulakis, Xiao Wang, Volkan Cevher
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results reveal fundamental differences between HB in min-max games and minimization, and numerical experiments further validate our theoretical results. ... Numerical experiments are integrated into the main text, while additional experiments and proofs are provided in the appendix due to space constraints. ... Experiments on GANs training. We provide experiments on GANs training dynamics. ... Our setup generally follows the Wasserstein GANs framework (Gulrajani et al., 2017) using the CIFAR-10 dataset. |
| Researcher Affiliation | Academia | 1Shanghai University of Finance and Economics, Shanghai, China 2National Institute of Informatics, Tokyo, Japan 3Aarhus University, Aarhus, Denmark 4Key Laboratory of Interdisciplinary Research of Computation and Economics, China 5EPFL, Lausanne, Switzerland. Correspondence to: Xiao Wang <EMAIL>. |
| Pseudocode | Yes | In Algorithm 1, we provide details of Adam algorithm (Kingma & Ba, 2014) with negative momentum used in the GANs training experiments in Section 5.2. The main difference from the standard Adam algorithm is that the heavy ball momentum parameter β1 is chosen as a negative number here. Algorithm 1 Adam with Negative β1 |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Our setup generally follows the Wasserstein GANs framework (Gulrajani et al., 2017) using the CIFAR-10 dataset. ... Additional results for Vanilla GANs. Dataset: MNIST |
| Dataset Splits | No | The paper mentions using CIFAR-10 and MNIST datasets but does not explicitly provide details about training/test/validation splits, such as percentages or specific counts, nor does it cite a standard split being used for these experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using the 'Adam algorithm' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Neural network architecture: Both generator and discriminator use the Res Net-32 architecture. Both generator and discriminator use the learning rate 2e-4, with a linearly decreasing step size schedule. The batch size is 64. ... The gradient penalty coefficient is chosen as 10. ... Learning rate: Both generator and discriminator use the learning rate 0.001 |