On the Crucial Role of Initialization for Matrix Factorization

Authors: Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation starts with a few-shot learning task following (Malladi et al., 2023). The objective is to rapidly adapt a language model with a small training set. The datasets for this experiment are drawn from GLUE and Super GLUE benchmarks (Wang et al., 2019b;a). The performance of different algorithms is summarized in Tab. 2. It is evident that OLo RA, Pi SSA, No RA, and No RA+ all outperform Lo RA because their initialization strategies have provided more favorable directions for optimization.
Researcher Affiliation Academia 1ETH Zurich, 2The University of Texas at Austin EMAIL EMAIL
Pseudocode Yes We summarize No RA and No RA+ in Algs. 1 and 2, respectively in the appendix, with additional explanations in Apdx. A.3.
Open Source Code Yes Code is available at https://github.com/Bingcong Li/No RA.
Open Datasets Yes The datasets for this experiment are drawn from GLUE and Super GLUE benchmarks (Wang et al., 2019b;a). Consistent with (Malladi et al., 2023), we randomly sample 1,000 data points for training and another 1,000 for testing. ... The base model is selected as Stable Diffusion v1.4 (Rombach et al., 2022) (0.98B parameters in total). ... We tackle commonsense reasoning tasks following the setup in (Hu et al., 2023). Training data are merged from 8 datasets listed in Tab. 4. ... For mathematical problems, we consider GSM8K (Cobbe et al., 2021) dataset ... We also adopt Meta Math QA dataset (Yu et al., 2024)... We also use SQu AD (question answering, (Rajpurkar et al., 2016)) in our experiments...
Dataset Splits Yes Consistent with (Malladi et al., 2023), we randomly sample 1,000 data points for training and another 1,000 for testing.
Hardware Specification Yes The experiments are conducted with PyTorch (Paszke et al., 2019) on NVIDIA H100 GPUs.
Software Dependencies No The experiments are conducted with PyTorch (Paszke et al., 2019) on NVIDIA H100 GPUs.
Experiment Setup Yes The hyperparameters adopted are searched over values in Tab. 5. Adam is adopted for optimization. ... For this experiment, we first search for the best batchsizes for Lo RA, and the same batchsize is applied for other tested algorithms as well. Then we search additionally for the best learning rate for each algorithm.