reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

Authors: Kai Liu, Bowen Xu, Shaoyu Wu, Xin Chen, Hao Zhou, Yongliang Tao, Lulu Hu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments with leading LLMs and demonstrate that La Ro SA is effective and robust across different types, sizes, and sparsity levels. La Ro SA presents minimal performance degradation while providing consistent wall-clock time speed-up. Specifically, for LLa MA2-7B at 40% sparsity, La Ro SA achieves a mere 0.17 perplexity gap with a consistent 1.30 wall-clock time speed-up, and reduces the accuracy gap in zero-shot tasks compared to the dense model to just 0.54%, while surpassing TEAL by 1.77% and CATS by 17.14%.
Researcher Affiliation	Industry	1Alibaba Group. Correspondence to: Kai Liu <EMAIL>, Bowen Xu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Grid Search for Optimal Sparsity Coefficients
Open Source Code	No	The paper does not provide an explicit statement or a link to its own source code for the methodology described. It mentions using 'Hugging Face Open-R1 repository' for evaluation but not for releasing its own implementation.
Open Datasets	Yes	We use the Wiki Text2 train set (Merity et al., 2016) as calibration dataset for La Ro SA and other reproducible works. We conduct experiments on complex tasks such as MATH500 (Lightman et al., 2024), GPQA-Diamond (Rein et al., 2024), and AIME 24 (AIME, 2025)
Dataset Splits	Yes	We use the Wiki Text2 train set (Merity et al., 2016) as calibration dataset for La Ro SA and other reproducible works. All models are evaluated on the same 128 random samples with a 2048-token context length.
Hardware Specification	Yes	The computation of Q is performed on 8x80G A100 GPUs, taking approximately 12 minutes to complete for the LLa MA3 70B model. Experiments are conducted on NVIDIA A100 GPUs. (Table 3 also lists 'A100' and 'H20' under the 'GPU' column).
Software Dependencies	No	The paper mentions software components like 'Triton-based kernel', 'Deja Vu (Liu et al., 2023)', 'TEAL (Liu et al., 2024a)', 'lm-evaluation-harness(Gao et al., 2023)', and 'Hugging Face Open-R1 repository (Face, 2025)' but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We randomly select 16 sequences with sequence length of 2048 tokens to compute the rotation matrices Q for La Ro SA and empirical distributions for CATS and TEAL. For sparsity coefficient α, we employ Grid Search to find the optimal hyperparameter for each model, as shown in Appendix B Algorithm 1. The optimal α for each activation type of models is presented in Appendix B Table 11. We collected ten samples, each consisting of 128 tokens, from various test datasets and generated new sequences with lengths ranging from 128 to 2048 tokens. Tensor parallelism (TP2) is set for LLa MA3-70B and Qwen2.5-72B.