reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization

Authors: Zhanfeng Mo, Long-Kai Huang, Sinno Jialin Pan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On LLa MA-1B, LORO achieves a perplexity score of 2% better than the full-size baseline, with 54% less model memory cost and offers a 1.8 speedup in training and a 2.2 speedup in inference. The code is available on Git Hub1. ... Extensive experiments demonstrate that LORO can discover competitive low-rank models with performance comparable to full-size baselines, while providing significant memory reduction and acceleration in both training and inference.
Researcher Affiliation	Collaboration	Nanyang Technological University, Singapore Tencent AI Lab The Chinese University of Hong Kong EMAIL; EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1 Low-rank Riemannian Optimizer Algorithm 2 Low-rank Riemannian Optimizer (LORO) in Py Torch (Paszke et al., 2019)
Open Source Code	Yes	The code is available on Git Hub1. 1https://github.com/mzf666/LORO-main
Open Datasets	Yes	We train all the models on the C4 (Colossal Clean Crawled Corpus) dataset (Raffel et al., 2019), a large-scale cleaned dataset designed for language models pretraining. ... Following the experiment setup in (Zhao et al., 2024, Section 5.4), we extend our LORO to finetune the pretrained Ro BERTa-base model (Liu et al., 2019) on GLUE datasets (Wang et al., 2019).
Dataset Splits	Yes	Following the experiment setup in (Zhao et al., 2024, Section 5.4), we extend our LORO to finetune the pretrained Ro BERTa-base model (Liu et al., 2019) on GLUE datasets (Wang et al., 2019). ... Table 8: Hyperparameters of LORO in fine-tuning Ro BERTa experiments. ... # Epochs
Hardware Specification	Yes	All the experiments are implemented in Py Torch (Paszke et al., 2019) and conducted on NVIDIA 40G A100 GPUs. ... We run all the experiments on 1 NVIDIA 40G A100 GPU
Software Dependencies	No	All the experiments are implemented in Py Torch (Paszke et al., 2019)... Adam optimizer (Kingma & Ba, 2015)... LLa MA-based language model (Touvron et al., 2023b)... Huggingface4 on the GLUE benchmark. The paper mentions software names but does not provide specific version numbers for the key libraries like PyTorch or Huggingface.
Experiment Setup	Yes	For all runs, we set the max data sequence length as 256 with a batch size of 512 (i.e., a token batch size of 131K). ... we employ a learning rate warmup starting from 0 during the first 10% of the pretraining steps, followed by a cosine annealing scheduler that decays to 10% of the maximum learning rate. We initialize the low-rank factors with Xavier initialization (Glorot & Bengio, 2010)... we set the LORO exact update frequency to K = 500 and the learning rate to 0.01 for the LLa MA-60M, -130M, and -350M models, while for LLa MA-1B, we set K = 200 and the learning rate to 0.005. ... Table 8: Hyperparameters of LORO in fine-tuning Ro BERTa experiments.