reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse Learning for State Space Models on Mobile

Authors: Xuan Shen, Hangyu Zheng, Yifan Gong, Zhenglun Kong, Changdi Yang, Zheng Zhan, Yushu Wu, Xue Lin, Yanzhi Wang, Pu Zhao, Wei Niu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method achieves superior task performance compared to other semi-structured pruning methods and achieves up-to 7 speedup compared to llama.cpp framework on mobile devices. ... We implement the sparse model with our proposed kernels on mobile devices and achieve a practical on-device speedup of up to 7 compared to llama.cpp.
Researcher Affiliation	Academia	1Northeastern University 2University of Georgia 3Harvard University {shen.xu}@northeastern.edu EMAIL
Pseudocode	Yes	The algorithm for sparse learning is presented in Algorithm 1, featuring re as the effectiveness loss ratio and rt as the target loss ratio. The max index retrieves the index of the maximum value along a speciﬁed dimension. The output Noutput generated by the algorithm represents the optimal sparse structure using the proposed kernel.
Open Source Code	No	The paper describes the implementation of their method and compares it against other open-source frameworks like llama.cpp, but does not explicitly state that their own code for the proposed methodology is publicly available, nor does it provide a link.
Open Datasets	Yes	We evaluate the task performance on multiple common sense reasoning datasets including LAMBADA (Paperno et al., 2016), Hella Swag (Zellers et al., 2019), PIQA (Bisk etm al., 2020), Arceasy (Clark et al., 2018), Arc-challenge (Clark et al., 2018), and Wino Grade (Sakaguchi et al., 2021).
Dataset Splits	No	The paper states it uses several common sense reasoning datasets but does not explicitly specify the training, test, or validation splits for these datasets. It mentions 'calibration with only 128 training samples' for weight compensation, which is not a general dataset split for model evaluation.
Hardware Specification	Yes	The latency evaluations are conducted on a Oneplus 11 mobile device equipped with Snapdragon 8 Gen 2 So C, featuring an octa-core Kryo CPU and Qualcomm Adreno 740 GPU with 16GB memory. ... Additionally, the results tested on another edge device, Xiaomi 6, which is equipped with a Snapdragon 835, featuring an octa-core CPU and an Adreno GPU with 6GB of memory, are included in Table A7.
Software Dependencies	No	The paper mentions comparing against 'llama.cpp (contributors, 2023a)' and that their compiler generates 'C++/Assembly code for mobile CPUs and Open CL code for mobile GPUs'. However, no specific version numbers for llama.cpp, the compiler itself, or any other software libraries/dependencies are provided.
Experiment Setup	Yes	For sparse learning, we train over 1000 steps using the SGD optimizer at a starting learning rate of 0.1, decaying it by 0.1 every 200 steps. The original model weights are frozen and sparse learning for Mamba-2.8B takes 50 mins on a A6000 GPU. ... The batch size is set to 1 for all models unless otherwise speciﬁed. Each experiment is repeated 50 times.