Losing Momentum in Continuous-time Stochastic Optimisation

Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Alessandro Scagliotti

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm attains competitive results compared to stochastic gradient descent with momentum.
Researcher Affiliation Academia Kexin Jin EMAIL Department of Mathematics Princeton University Princeton, NJ 08544-1000, USA; Jonas Latz EMAIL Department of Mathematics The University of Manchester Manchester, M13 9PL, United Kingdom; Chenguang Liu EMAIL Delft Institute of Applied Mathematics Technische Universiteit Delft 2628 Delft, The Netherlands; Alessandro Scagliotti EMAIL CIT School, Technische Universit at M unchen
Pseudocode No The paper includes mathematical equations for algorithms (e.g., (9), (10), (36)) but does not present them in a structured pseudocode block or algorithm section.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes We now employ the discrete SGMP method (36) to solve the CIFAR-10 (Krizhevsky, 2009), image classification task with a convolutional neural network (CNN).
Dataset Splits Yes The CIFAR-10 data set consists of 6 104 colour images (32 32 pixels) which are split into 5 104 training images and 104 test images.
Hardware Specification Yes The training is done with Google Colab using GPUs (often Tesla V100, sometime Tesla A100).
Software Dependencies No The paper mentions general software environments like Google Colab and the use of CNNs, but does not specify particular software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train for 800 epochs with batch size ℓ= 100 and no weight decay. We use constant learning rate η = 0.01 for SGD and the classical momentum. In classical momentum, we set the momentum hyperparameter ρ = 0.9.