reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

Authors: Xianliang Li, Jun Luo, Zhiwei Zheng, Hanxiao Wang, Li Luo, Lingkun Wen, Linlong Wu, Sheng Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers. (Page 1) ... In this section, we present an empirical study to discover the influence of the momentum coefficients by comparing the test performance on momentum systems with different dynamic magnitude responses. We train VGG (Simonyan & Zisserman, 2014) on the CIFAR-10 (Krizhevsky et al., 2009) dataset and Res Net50 (He et al., 2016) on the CIFAR-100 dataset using different momentum coefficients, while keeping all other hyperparameters unchanged. For each experiment, we report the mean and standard error (as subscripts) of test accuracy for 3 runs with random seeds from 0-2. The detailed experimental settings can be found in Appendix D. The experimental results in CIFAR10 show high similarity to those in CIFAR-100. Thus, here, we mainly focus on the analysis based on CIFAR-100 and defer the experimental results of VGG16 on CIFAR-10 in Appendix C.3. (Page 4)
Researcher Affiliation	Academia	Xianliang Li 1,2, Jun Luo 1,2, Zhiwei Zheng 3, Hanxiao Wang2,4, Li Luo5, Lingkun Wen2,6, Linlong Wu7, Sheng Xu 1 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3University of California, Berkeley 4Institute of Automation, Chinese Academy of Sciences 5Sun Yat-sen University 6Shanghai Astronomical Observatory, Chinese Academy of Sciences 7University of Luxembourg EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: FSGDM Input: Σ, c, v, N; Initialization: m0, µ = cΣ, δ = Σ/N; for each t = 1, 2, . . . do gt = Lt(xt 1, ζt 1); u(t) = t t+µ, ut = u( t/δ δ); mt = utmt 1 + vgt; xt = xt 1 αtmt; end
Open Source Code	Yes	Our implementation of FSGDM is available at https://github.com/yinleung/FSGDM.
Open Datasets	Yes	We train VGG (Simonyan & Zisserman, 2014) on the CIFAR-10 (Krizhevsky et al., 2009) dataset and Res Net50 (He et al., 2016) on the CIFAR-100 dataset... (Page 4) ...Tiny-Image Net (Le & Yang, 2015)... (Page 7) ...ILSVRC 2012 Image Net Russakovsky et al. (2015). (Page 7) ...IWSLT14 German-English translation task (Cettolo et al., 2014)... (Page 8) ...Walked2d-v4, Half Cheetah-v4, and Ant-V4, which are continuous control environments simulated by the standard and widely-used engine, Mu Jo Co (Todorov et al., 2012). (Page 8)
Dataset Splits	No	The paper does not explicitly provide specific dataset split percentages, sample counts, or citations to predefined splits. It mentions using standard datasets like CIFAR-10, CIFAR-100, Tiny-Image Net, and Image Net, which commonly have predefined splits, but it does not state what those splits are in the text.
Hardware Specification	Yes	All experiments are conducted on RTX 4090 or A100 GPUs. (Page 18) ...We train all models for 100 epochs using a single NVIDIA RTX 4090 GPU. (Page 18)
Software Dependencies	No	The paper mentions using "Py Torch tutorial code" (Page 18), "Fair Seq framework" (Page 18), and "Tianshou codebase (Weng et al., 2022)" (Page 8). However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	We choose the Cosine Annealing LR (Loshchilov & Hutter, 2016) as our training scheduler. Additionally, we set the learning rate as 1e-1 for all experiments, while the weight decay is set as 5e-4 for experiments on CIFAR-10, CIFAR-100, and Tiny-Image Net, and 1e-1 for Image Net. All models we used are simply following their paper s original architecture, and adopt the weight initialization introduced by He et al. (2015). Additionally, we train 300 epochs for experiments on CIFAR10 and CIFAR-100 and train 100 epochs for Tiny-Image Net and Image Net. We use a 128 batch size for experiments on CIFAR-10, CIFAR-100, and Tiny-Image Net, and 256 for Image Net. (Page 18) ...We set the maximum batch size to 4,096 tokens and apply gradient clipping with a threshold of 0.1. The baseline learning rate is set to 0.25, and for the optimizer, we use a weight decay of 0.0001. (Page 18) ...we searched for suitable learning rates across the three games, ultimately setting 10e-2, 10e-2 and 10e-3 for Walker2d-v4, Half Cheetah-v4, and Ant-v4, respectively. (Page 18)