reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Streamlining Redundant Layers to Compress Large Language Models

Authors: Xiaodong Chen, Yuxuan Hu, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that LLM-Streamline outperforms both previous and concurrent state-of-the-art pruning methods in terms of both performance and training efficiency. Our code is available at this repository.
Researcher Affiliation	Academia	1 Engineering Research Center of Database and Business Intelligence, MOE, China 2 School of Information, Renmin University of China,Beijing, China 3 Key Laboratory of Data Engineering and Knowledge Engineering, MOE, China 4 Zhongguancun Laboratory, China EMAIL EMAIL
Pseudocode	No	The paper describes the workflow of LLM-Streamline in Section 2, detailing layer pruning and layer replacement, but does not present it in a structured pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at this repository.
Open Datasets	Yes	We conduct experiments on 12 well-known classification benchmarks and 3 generation benchmarks. Our results show that for an LLM with 7B or 13B parameters and a 25% pruning rate, we can maintain 93% performance in classification tasks and 77% in generation tasks without requiring a lot of training data, outperforming existing SOTA pruning methods.
Dataset Splits	Yes	We randomly sample the data based on the distribution used by Sheared LLa Ma (Xia et al., 2023), finally constructing the dataset containing 30,000 pieces of data. We randomly select 500 samples from this dataset and input them into LLMs, generating Fig. 2, and use these 500 data samples for layer pruning. All 30,000 pieces of data are used to train the lightweight network.
Hardware Specification	Yes	On a single A800 GPU, the training duration for the lightweight network is approximate 5 hours (for the Transformer layer).
Software Dependencies	No	The paper mentions using language models and training processes but does not specify software dependencies with version numbers, such as PyTorch, TensorFlow, or Python versions.
Experiment Setup	Yes	For both the FFN structure and the Swi GLU structure, the learning rate is set to 1e-3 and the weight decay is 1e-4. For the Transformer layer, the learning rate is set to 1e-5 and the weight decay is 1e-3. The model is trained using a batch size of 32 over 20 epochs. ... For layer replacement, in order to have a fairer comparison with Lo RA, we conduct one epoch of post-training with a learning rate of 5e-5, a weight decay of 1e-3, and a batch size of 32. ... For Lo RA, we set the rank to 128