Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism
Authors: Tehila Dahan, Kfir Y Levy
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies further validate the robustness and enhanced stability of our approach. We empirically demonstrate the improved stability and performance of our methods over various baselines, confirming both the theoretical and practical advantages of our approach. Section 6: EXPERIMENTS |
| Researcher Affiliation | Academia | Tehila Dahan ECE Department Technion Haifa, Israel EMAIL Kfir Y. Levy ECE Department Technion Haifa, Israel EMAIL |
| Pseudocode | Yes | Algorithm 1 µ2-SGD. Algorithm 2 µ2 Extra SGD. |
| Open Source Code | Yes | our Git Hub repository.1 https://github.com/dahan198/mu2sgd |
| Open Datasets | Yes | The evaluation is conducted on the MNIST dataset (Le Cun et al., 2010), using a logistic regression model. We demonstrate the effectiveness of our approach in non-convex settings using a 2-layer convolutional network on the MNIST dataset and Res Net-18 on the CIFAR-10 dataset (Krizhevsky et al., 2014). |
| Dataset Splits | No | The paper mentions that for MNIST, "Both the training and testing phases employed mini-batches of size 64, with one full pass (epoch) over the dataset." For CIFAR-10, "We trained Res Net-18 for 25 epochs using mini-batches of size 32." While it indicates data usage for training and testing, it does not provide specific split percentages, sample counts, or explicit references to predefined train/test/validation splits. |
| Hardware Specification | Yes | The convex experiments were run on an Apple M2 chip, while the non-convex experiments were executed on an NVIDIA A30 GPU. |
| Software Dependencies | No | All experiments were conducted using the Py Torch framework. The paper mentions PyTorch but does not specify a version number. |
| Experiment Setup | Yes | We compared the following optimization algorithms over a range of fixed learning rates. The convex experiments were run on the MNIST dataset... Both the training and testing phases employed mini-batches of size 64, with one full pass (epoch) over the dataset. The following algorithms were evaluated with their respective parameter settings: µ2-SGD with αt = t and βt = 1/t, STORM with βt = 1/t, and Anytime-SGD with αt = t. We trained Res Net-18 for 25 epochs using mini-batches of size 32. Training included Random Crop (32 32, padding=2, p=0.5) and Random Horizontal Flip (p=0.5) for data augmentation. The following algorithms were evaluated with their respective fixed parameter settings: µ2-SGD with γt = 0.1 and βt = 0.9, STORM with βt = 0.9, and Anytime-SGD with γt = 0.1. |