reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLOVER: Cross-Layer Orthogonal Vectors Pruning

Authors: Fanxu Meng, Pingzhi Tang, Fan Jiang, Muhan Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments In Section 4.1, we compare CLOVER with Slice GPT (Ashkboos et al., 2024) and Trans MLA (Meng et al., 2025), which respectively prune Deep Seek-v2-Lite (Deep Seek-AI, 2024) and LLa MA-2-7B (AI@Meta, 2023). In Section 4.2, we visualize how CLOVER removes linear redundancy between vectors, facilitating more efficient pruning. In Section 4.3, we evaluate the acceleration performance of CLOVER. In Section 4.4, we demonstrate CLOVER s ability to perform significant pruning In Section 4.5, we apply CLOVER to orthogonalize the attention heads of the GPT-2-XL model (Radford et al., 2019), to explore the role of CLOVER in both pruning and fine-tuning. In Section 4.6, we conduct fine-tuning experiments on eight commonsense tasks, comparing CLOVER with SOTA PEFT methods.
Researcher Affiliation	Academia	1Institute for Artificial Intelligence, Peking University 2State Key Laboratory of General Artificial Intelligence, BIGAI. Correspondence to: Muhan Zhang <EMAIL>.
Pseudocode	No	The paper describes the CLOVER method step-by-step in Section 3 ("CLOVER: Cross-Layer Orthogonal Vectors") using prose and mathematical formulas, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at: https://github.com/Graph PKU/CLOVER
Open Datasets	Yes	We use the official Whisper-large-v3 example (Libri Speech Long dataset (Gandhi et al., 2023)1) to intuitively highlight the effectiveness of CLOVER pruning. For reference, the waveform of this input is shown in Figure 4, and the corresponding target translation script is provided in Appendix C. 1https://huggingface.co/openai/whisper-large-v3
Dataset Splits	Yes	The commonsense reasoning tasks consist of 8 subtasks, each with predefined training and testing sets, as described by LLM-Adapters (Hu et al., 2023). The following table lists the details of each sub-dataset. Table 7. Details of datasets for commonsense reasoning tasks. Dataset Train Test About Bool Q (Clark et al., 2019) 9,427 3,270 Naturally occurring yes/no questions from unconstrained settings.
Hardware Specification	Yes	In Figure 3, we benchmark the inference performance of CLOVER featuring a 92.97% reduction in the KV cache and a 50% reduction in the Q nope, K nope, and V head dimensions using the v LLM framework across three GPUs with varying compute capabilities and memory sizes: 165.2 TFLOPS with 24GB memory, 312 TFLOPS with 40GB memory, and 320 TFLOPS with 64GB memory.
Software Dependencies	No	The paper mentions using 'nano GPT framework' and 'v LLM framework' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	PEFT typically converges more slowly than full-parameter finetuning. To accelerate convergence, we increase the learning rate from 6 10 4 to 6 10 3 and remove weight decay, while keeping all other hyperparameters consistent with those used in Vanilla and CLOVERFT.