CLOVER: Cross-Layer Orthogonal Vectors Pruning
Authors: Fanxu Meng, Pingzhi Tang, Fan Jiang, Muhan Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments In Section 4.1, we compare CLOVER with Slice GPT (Ashkboos et al., 2024) and Trans MLA (Meng et al., 2025), which respectively prune Deep Seek-v2-Lite (Deep Seek-AI, 2024) and LLa MA-2-7B (AI@Meta, 2023). In Section 4.2, we visualize how CLOVER removes linear redundancy between vectors, facilitating more efficient pruning. In Section 4.3, we evaluate the acceleration performance of CLOVER. In Section 4.4, we demonstrate CLOVER s ability to perform significant pruning In Section 4.5, we apply CLOVER to orthogonalize the attention heads of the GPT-2-XL model (Radford et al., 2019), to explore the role of CLOVER in both pruning and fine-tuning. In Section 4.6, we conduct fine-tuning experiments on eight commonsense tasks, comparing CLOVER with SOTA PEFT methods. |
| Researcher Affiliation | Academia | 1Institute for Artificial Intelligence, Peking University 2State Key Laboratory of General Artificial Intelligence, BIGAI. Correspondence to: Muhan Zhang <EMAIL>. |
| Pseudocode | No | The paper describes the CLOVER method step-by-step in Section 3 ("CLOVER: Cross-Layer Orthogonal Vectors") using prose and mathematical formulas, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at: https://github.com/Graph PKU/CLOVER |
| Open Datasets | Yes | We use the official Whisper-large-v3 example (Libri Speech Long dataset (Gandhi et al., 2023)1) to intuitively highlight the effectiveness of CLOVER pruning. For reference, the waveform of this input is shown in Figure 4, and the corresponding target translation script is provided in Appendix C. 1https://huggingface.co/openai/whisper-large-v3 |
| Dataset Splits | Yes | The commonsense reasoning tasks consist of 8 subtasks, each with predefined training and testing sets, as described by LLM-Adapters (Hu et al., 2023). The following table lists the details of each sub-dataset. Table 7. Details of datasets for commonsense reasoning tasks. Dataset Train Test About Bool Q (Clark et al., 2019) 9,427 3,270 Naturally occurring yes/no questions from unconstrained settings. |
| Hardware Specification | Yes | In Figure 3, we benchmark the inference performance of CLOVER featuring a 92.97% reduction in the KV cache and a 50% reduction in the Q nope, K nope, and V head dimensions using the v LLM framework across three GPUs with varying compute capabilities and memory sizes: 165.2 TFLOPS with 24GB memory, 312 TFLOPS with 40GB memory, and 320 TFLOPS with 64GB memory. |
| Software Dependencies | No | The paper mentions using 'nano GPT framework' and 'v LLM framework' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | PEFT typically converges more slowly than full-parameter finetuning. To accelerate convergence, we increase the learning rate from 6 10 4 to 6 10 3 and remove weight decay, while keeping all other hyperparameters consistent with those used in Vanilla and CLOVERFT. |