reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning

Authors: Minping Chen, You-Liang Huang, Zeyi Wen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results across common datasets on different pre-trained backbones (i.e., Ro BERTa-large, OPT-13B and OPT-30B) demonstrate that Lo HO can significantly improve the predictive accuracy and convergence rate of Me ZO, while controlling the memory footprint during fine-tuning.
Researcher Affiliation	Academia	Minping Chen1, You-Liang Huang1, Zeyi Wen *1,2 1 The Hong Kong University of Science and Technology (Guangzhou) 2 The Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and textual descriptions, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Our code is available at https://github.com/Chan-1996/Lo HO.
Open Datasets	Yes	For the Ro BERTa-large experiments, we used the following datasets: SST-2 (Socher et al. 2013), RTE (Cer et al. 2017), MNLI (Williams, Nangia, and Bowman 2018) and SNLI (Bowman et al. 2015)... for the OPT experiments, we used the following datasets, including RTE (Cer et al. 2017), Bool Q (Clark et al. 2019), CB (De Marneffe, Simons, and Tonhauser 2019), Multi RC (Khashabi et al. 2018) and WIC (Pilehvar and Camacho-Collados 2018).
Dataset Splits	Yes	for the Ro BERTa-large experiments, we used the following datasets:... We followed the settings of Malladi et al. (2024), which used 512 examples per class for both training and validation... for the OPT experiments,... We randomly sampled 1,000 examples for training, 500 examples for validation, and 1,000 examples for testing, which is the same as Me ZO (Malladi et al. 2024).
Hardware Specification	Yes	For example, we find that when using an A800 GPU to fine-tune OPT-13B with Me ZO, it exhibits over 10GB of free memory... Memory budget: a single RTX 4090 GPU with 24GB memory... Memory budget: a single A800 GPU with 80GB memory.
Software Dependencies	No	The paper mentions several optimizers (e.g., Adam, AdamW, SGD, Me ZO) and a 'sparse operations library' but does not provide specific version numbers for any of these software components or the underlying deep learning framework.
Experiment Setup	Yes	Another question is how to set the ratio of parameters to be updated by the FO optimizer in each layer... the learning rate of the ZO optimizer can be configured to be several orders of magnitude lower than that of the FO optimizer... there is a perturbation scale ϵ in the gradient estimation function (cf. Equation 1) which is commonly set to a value much smaller than one (e.g., 0.01 or 0.001) (Malladi et al. 2024)... For example, for the OPT-30B model, the maximum number of FO layers is four using a single A800 GPU... bz=64 (RoBERTa), bz=16 (OPT-13B), bz=8 (OPT-30B).