reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MoDeGPT: Modular Decomposition for Large Language Model Compression

Authors: Chi-Heng Lin, Shangqian Gao, James Smith, Abhishek Patel, Shikhar Tuli, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Mo De GPT, without relying on backward propagation, consistently matches or surpasses the performance of prior techniques that depend on gradient information, while achieving a 98% reduction in compute costs when compressing a 13B-parameter model. On LLa MA-2/3 and OPT models, Mo De GPT retains 90-95% of zero-shot performance with compression rates of 25-30%. We present a thorough evaluation of Mo De GPT, comparing it against existing methods across key metrics, including perplexity, downstream accuracy, and real-world speed improvements. The paper includes a dedicated "4 EXPERIMENTS" section detailing empirical results, comparisons, and ablation studies across various models and tasks.
Researcher Affiliation	Collaboration	Chi-Heng Lin Samsung Research America Shangqian Gao Florida State University James Seale Smith Samsung Research America Abhishek Patel Samsung Research America Shikhar Tuli Samsung Research America Yilin Shen Samsung Research America Hongxia Jin Samsung Research America Yen-Chang Hsu Samsung Research America. The affiliations include both Samsung Research America (industry) and Florida State University (academia).
Pseudocode	Yes	Algorithm 1 Type-I compression for MLP by Nyström approximation. Algorithm 2 Type-II compression for key-query matrices by CR decomposition. Algorithm 3 Type-III compression for value-output matrices by SVD.
Open Source Code	No	The paper states: "We implemented our models using Hugging Face Transformers (Wolf et al., 2019), with correlation computations in FP64." and "We utilize the Hugging Face generation library (Wolf et al., 2019) to implement our LLM models and adapt the Slice GPT (Ashkboos et al., 2024) Git Hub repository for correlation matrix estimations." This indicates the use of existing open-source tools and an adaptation of a third-party repository, but there is no explicit statement from the authors about releasing their own implementation of Mo De GPT.
Open Datasets	Yes	Following calibration setups similar to prior studies (Frantar et al., 2022; Ashkboos et al., 2024; Dettmers et al., 2023), we employed the Wiki Text-2 (Merity et al., 2016) and Alpaca datasets (Taori et al., 2023), each comprising 128 samples of 2048 characters. Zero-shot performance was evaluated using the LM Evaluation Harness (Gao et al., 2021), with task details provided in Appendix B.2.
Dataset Splits	Yes	Following calibration setups similar to prior studies (Frantar et al., 2022; Ashkboos et al., 2024; Dettmers et al., 2023), we employed the Wiki Text-2 (Merity et al., 2016) and Alpaca datasets (Taori et al., 2023), each comprising 128 samples of 2048 characters. We use a calibration set of 128 random samples, each 2048 in length, from the Alpaca dataset, and a recovery fine-tuning set of 8000 samples, each 1024 in length, employing Lo RA (Hu et al., 2021).
Hardware Specification	Yes	Model compression and performance testing were conducted on a single NVIDIA A100 80GB GPU, except for the 70B model, which we used 8 A100 GPUs. The throughput benchmarks in Appendix B.16 also mention: "Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50 GHz with 20 cores".
Software Dependencies	No	The paper mentions: "We implemented our models using Hugging Face Transformers (Wolf et al., 2019), with correlation computations in FP64." and "We utilize torch.svd and torch.pinv in Py Torch for performing Singular Value Decomposition (SVD) and computing the Moore-Penrose inverse on tensors of dtype FP64." While software packages like Hugging Face Transformers and PyTorch are mentioned, specific version numbers for these libraries are not provided.
Experiment Setup	Yes	Unless otherwise specified, the calibration set consists of a random sample of 128 sequences, each of length 2048, from Wiki Text-2... MLP Module Algorithm 1 requires a ridge leverage score parameter λ. We find that the results are largely insensitive to this parameter; therefore, we simply use λ = 1 across all experiments. We use Slice GPT s hyperparameters for Lo RA, except for the learning rate, which is set to 5 10 5. The other primary hyperparameters used are lora_alpha = 10, lora_r = 32, lora_dropout = 0.05, and batch_size = 3.