reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Authors: Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu, Dan Alistarh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate LDAdam for fine-tuning and pre-training large language models against baseline Adam (Kingma & Ba, 2017) and the memory-efficient Ga Lore (Zhao et al., 2024) (Algorithm 2). We apply LDAdam for fine-tuning Ro BERTa (Liu et al., 2019) and Llama-family (Touvron et al., 2023) models on the GLUE (Wang et al., 2018) and Grade-School Math (GSM) (Cobbe et al., 2021) benchmarks, respectively.
Researcher Affiliation	Academia	Thomas Robert1 , Mher Safaryan2, Ionut-Vlad Modoranu2, Dan Alistarh2 1Institut Polytechnique de Paris (IPP) 2Institute of Science and Technology Austria (ISTA) Correspondence to EMAIL.
Pseudocode	Yes	Algorithm 1 LDAdam ( Practical View Only, gt Rn m / Analytical View Only, gt Rd )
Open Source Code	Yes	Code is available at https://github.com/IST-DASLab/LDAdam.
Open Datasets	Yes	We apply LDAdam for fine-tuning Ro BERTa (Liu et al., 2019) and Llama-family (Touvron et al., 2023) models on the GLUE (Wang et al., 2018) and Grade-School Math (GSM) (Cobbe et al., 2021) benchmarks, respectively. [...] We evaluate LDAdam for pre-training Llama models (Touvron et al., 2023) on the C4 dataset (Raffel et al., 2023).
Dataset Splits	No	The paper mentions using GLUE, GSM8K, and C4 datasets and benchmarks, and references standard training parameters like epochs, batch size, and sequence length. However, it does not explicitly provide details about how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or explicit reference to standard splits used for reproducibility).
Hardware Specification	Yes	Table 6 reports peak memory for fine-tuning and pre-training on a single NVIDIA H100 80GB GPU with micro batch size 1 and without activation checkpointing.
Software Dependencies	No	The paper mentions 'Py Torch implementation' and provides examples using PyTorch functions. However, it does not specify version numbers for PyTorch or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	Tables 10, 11, and 12 detail all the hyperparameters we use when fine-tuning respectively Ro BERTa-base model on the GLUE benchmark and Llama-2 7B model on the GSM8K dataset, and when pre-training Llama models on the C4 dataset. These include Epochs, Batch Size, Learning Rate, Decay Rate β1, Decay Rate β2, Weight Decay, Dropout, Gradient Clipping, etc.