reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Crucial Role of Initialization for Matrix Factorization

Authors: Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation starts with a few-shot learning task following (Malladi et al., 2023). The objective is to rapidly adapt a language model with a small training set. The datasets for this experiment are drawn from GLUE and Super GLUE benchmarks (Wang et al., 2019b;a). The performance of different algorithms is summarized in Tab. 2. It is evident that OLo RA, Pi SSA, No RA, and No RA+ all outperform Lo RA because their initialization strategies have provided more favorable directions for optimization.
Researcher Affiliation	Academia	1ETH Zurich, 2The University of Texas at Austin EMAIL EMAIL
Pseudocode	Yes	We summarize No RA and No RA+ in Algs. 1 and 2, respectively in the appendix, with additional explanations in Apdx. A.3.
Open Source Code	Yes	Code is available at https://github.com/Bingcong Li/No RA.
Open Datasets	Yes	The datasets for this experiment are drawn from GLUE and Super GLUE benchmarks (Wang et al., 2019b;a). Consistent with (Malladi et al., 2023), we randomly sample 1,000 data points for training and another 1,000 for testing. ... The base model is selected as Stable Diffusion v1.4 (Rombach et al., 2022) (0.98B parameters in total). ... We tackle commonsense reasoning tasks following the setup in (Hu et al., 2023). Training data are merged from 8 datasets listed in Tab. 4. ... For mathematical problems, we consider GSM8K (Cobbe et al., 2021) dataset ... We also adopt Meta Math QA dataset (Yu et al., 2024)... We also use SQu AD (question answering, (Rajpurkar et al., 2016)) in our experiments...
Dataset Splits	Yes	Consistent with (Malladi et al., 2023), we randomly sample 1,000 data points for training and another 1,000 for testing.
Hardware Specification	Yes	The experiments are conducted with PyTorch (Paszke et al., 2019) on NVIDIA H100 GPUs.
Software Dependencies	No	The experiments are conducted with PyTorch (Paszke et al., 2019) on NVIDIA H100 GPUs.
Experiment Setup	Yes	The hyperparameters adopted are searched over values in Tab. 5. Adam is adopted for optimization. ... For this experiment, we first search for the best batchsizes for Lo RA, and the same batchsize is applied for other tested algorithms as well. Then we search additionally for the best learning rate for each algorithm.