reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Authors: Philip Zmushko, Aleksandr Beznosikov, Martin Takáč, Samuel Horváth

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the practical applicability of FRUGAL, we conduct extensive experiments in popular real-world scenarios1. In these experiments, we pre-train LLa MA-like models (up to 1B parameters) on the Colossal Clean Crawled Corpus (C4) dataset (Raffel et al., 2020) and fine-tune Ro BERTa (Liu, 2019) on the GLUE benchmark (Wang, 2018). The results show that our method significantly outperforms previous memory-efficient algorithms while using less memory budget.
Researcher Affiliation	Collaboration	1Yandex, Russia 2Moscow Institute of Physics and Technology, Russia 3Ivannikov Institute for System Programming RAS, Russia 4Skolkovo Institute of Science and Technology, Russia 5Mohamed bin Zayed University of Artificial Intelligence, UAE. Correspondence to: Philip Zmushko <EMAIL>.
Pseudocode	Yes	Algorithm 1 FRUGAL (State-Full, State-Free) Input: model fθ with p parameter sets {θi Rdi}p i=1, loss L, gradient projectors {Pk,i}p i=1, number of steps K... Algorithm 2 FRUGAL (SGDM, SGD)... Algorithm 4 FRUGAL step pseudocode, Py Torch-like... Algorithm 5 Examples of state-full and state-free steps for Algorithm 4
Open Source Code	Yes	The code is available at https://anonymous.4open.science/r/FRUGAL-D3CA.
Open Datasets	Yes	In these experiments, we pre-train LLa MA-like models (up to 1B parameters) on the Colossal Clean Crawled Corpus (C4) dataset (Raffel et al., 2020) and fine-tune Ro BERTa (Liu, 2019) on the GLUE benchmark (Wang, 2018).
Dataset Splits	Yes	To verify the practical applicability of FRUGAL, we conduct extensive experiments in popular real-world scenarios1. In these experiments, we pre-train LLa MA-like models (up to 1B parameters) on the Colossal Clean Crawled Corpus (C4) dataset (Raffel et al., 2020) and fine-tune Ro BERTa (Liu, 2019) on the GLUE benchmark (Wang, 2018)... We evaluated the performance of our framework in memoryefficient fine-tuning using the GLUE benchmark (Wang, 2018), a widely-used collection of tasks for evaluating language models... Following the experimental protocol from Hu et al. (2023), we apply memory-efficient methods to the same parameter subsets: the Q, K, V, Up, and Down projection matrices. We used the same hyperparameter configuration as in the original work
Hardware Specification	No	No specific hardware details for running experiments were provided in the paper. The mention of 'A100-80GB' was in the context of memory requirements for large models, not as hardware used by the authors.
Software Dependencies	No	The paper implies the use of PyTorch through the pseudocode section 'Algorithm 4 FRUGAL step pseudocode, Py Torch-like', but it does not specify any version numbers for PyTorch or other software libraries.
Experiment Setup	Yes	The core setup for pre-training is taken from Zhao et al. (2024a). We utilize LLa MA-based (Touvron et al., 2023a) model architectures and train them on the Colossal Clean Crawled Corpus (C4) dataset (Raffel et al., 2020). The C4 dataset is intended for pre-training, making this setup a good approximation of real-world applications. A detailed description of the setup can be found in Appendix A.1... We used standard Adam hyperparameters: β1 = 0.9, β2 = 0.999, ε = 1e 8. For all methods except Ga Lore, we selected the learning rate equal to the optimal learning rate for Adam, which we determined through a grid search among values [1e 4, 3e 4, 1e 3, 3e 3]. FRUGAL s learning rate for the state-free optimizer was set equal to that for the state-full optimizer for simplicity and ease of tuning.