SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization

Authors: Kwangryeol Park, Seulki Lee

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memoryefficient optimizers, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance on various CNN and Transformer tasks.
Researcher Affiliation Academia Kwangryeol Park1, Seulki Lee2 1Artificial Intelligence Graduate School UNIST, South Korea 2Department of Computer Science and Engineering, UNIST, South Korea
Pseudocode Yes Algorithm 1: Overall SMMF applied to each layer. The elements of r, c, M, V , and S are initially set to zeros.
Open Source Code Yes Code https://github.com/eai-lab/SMMF
Open Datasets Yes We apply the five optimizers, including SMMF, to two representative image tasks, i.e., image classification and object detection, and evaluate them by 1) training Res Net-50 (He et al. 2016) and Mobile Net V2 (Dong et al. 2020) on CIFAR100 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), and 2) training YOLOv5s and YOLOv5m (Ultralytics 2021) on COCO (Lin et al. 2015).
Dataset Splits Yes We apply the five optimizers, including SMMF, to two representative image tasks, i.e., image classification and object detection, and evaluate them by 1) training Res Net-50 (He et al. 2016) and Mobile Net V2 (Dong et al. 2020) on CIFAR100 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), and 2) training YOLOv5s and YOLOv5m (Ultralytics 2021) on COCO (Lin et al. 2015). (Implies use of standard benchmark splits for these well-known datasets.)
Hardware Specification No The paper does not explicitly state the specific hardware used for running its experiments, such as GPU models, CPU models, or memory. It mentions 'different machines' in passing, but no specifics.
Software Dependencies No We implement the proposed SMMF using Py Torch (Paszke et al. 2017), which is available both on an Git Hub and in Appendix M. (PyTorch is mentioned but without a specific version number.)
Experiment Setup No The detailed experimental setups and training configurations are provided in Appendix L.