reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wasserstein Distances, Neuronal Entanglement, and Sparsity

Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To analyze the phenomenon of neuronal superposition under sparsity in greater detail, we create an experimental framework, which we dub Sparse Expansion. It expands a model into a mixture of sparse experts by clustering input embeddings layer-wise. Based on this clustering, Sparse Expansion utilizes the input-aware nature of the Sparse GPT (Frantar & Alistarh, 2023) pruning algorithm to specialize different sparse experts to different sets of inputs, starting from the same base weights. Through Sparse Expansion, we are able to analyze the entangled neurons in much more detail, since now different subgroups of the inputs are being computed with different edges (Figure 1f, A8f). We find that as a neuron lose edges, its output distribution tends to shift toward a Gaussian distribution (Figure A9). However, through Sparse Expansion, the original output distribution can be better preserved under sparse computation (Figure 1e, A8e). We relate our findings to recent theoretical work on the bounds of neural computation under superposition (H anni et al., 2024; Adler & Shavit, 2024).
Researcher Affiliation	Collaboration	Shashata Sawmya1 , Linghao Kong1 , Ilia Markov2, Dan Alistarh2,3,4, & Nir Shavit1,3,4 1MIT 2IST Austria 3Neural Magic 4Red Hat EMAIL, EMAIL
Pseudocode	Yes	Algorithm A1 describes the sparsification process of Sparse Expansion. The sparse experts are created in a layer-wise sequential fashion for each linear layer of every FFN transformer block to create the sparse model. Algorithm A2 refers to the inference procedure of Sparse Expansion once the model is pruned following the methods described in Algorithm A1 and Section 3.1.
Open Source Code	Yes	1Code available at https://github.com/Shavit-Lab/Sparse-Expansion.
Open Datasets	Yes	For reading comprehension, we use the 1-shot variant of the SQu AD 2.0 dataset (Rajpurkar et al., 2018). To assess knowledge reasoning and mathematical capabilities, we evaluate the model on the 5-shot Trivia QA-Wiki (Joshi et al., 2017) and 5-shot GSM8K (Cobbe et al., 2021) datasets, respectively. Finally, to evaluate general reasoning, we test the model on two benchmarks: an easy task, 5-shot MMLU (Hendrycks et al., 2020), and a more challenging task, 3-shot Chain-of-Thought (Co T) Big Bench Hard (BBH) (Suzgun et al., 2022).
Dataset Splits	No	The paper states using "a subset of the Wikitext-2 train dataset as calibration data for input-aware pruning and evaluate using the corresponding test set through the perplexity metric" and specifies "N-shot" settings for evaluation benchmarks, but does not provide specific percentages, sample counts, or explicit references to predefined splits with full bibliographic information for the splits themselves to ensure reproducibility of data partitioning.
Hardware Specification	Yes	We have run the layer-wise benchmarks for the typical layers sizes from Llama models on a single RTX3090 GPU.
Software Dependencies	No	The paper mentions several software components like "SciPy", "RAPIDS library", "PyTorch", "Sparse Marlin", and "Sparse GPT GitHub repository", but consistently omits specific version numbers for these, which are necessary for reproducible dependency management.
Experiment Setup	Yes	For our performance benchmarks, we use 16 clusters at each level of routing in Sparse Expansion. We evaluate the performance of Sparse Expansion against other one-shot pruning techniques across a range of model sizes in Pythia and sparsities in Llama-2-7B (Figure 9). Across all model sizes of Pythia, Sparse Expansion outperforms all other pruning techniques at 50% unstructured sparsity, approaching dense performance as model size increases. Moreover, for Llama-2-7B, across all levels of sparsity, Sparse Expansion outperforms all other techniques.