reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Authors: Junxuan Wang, Xuyang Ge, Wentao Shu, Qiong Tang, Yunhua Zhou, Zhengfu He, Xipeng Qiu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Qualitative and quantitative experiments show that these two family models learn a lot of similar features. We also utilize our SAEs to investigate induction circuits (Olsson et al., 2022) in Mamba, which are also highly analogous to ones in Transformers. We include experimental results for the 2.8B models of both families in Appendix D.5 to demonstrate the generalizability of our findings. Figure 3a presents the results of all MPPC experiments on SAE features, namely random baseline, Pythia-Mamba SAE similarity (main experiment), model seed skyline and SAE seed skyline. Figure 3b presents the results of random baseline and main experiment on neurons.
Researcher Affiliation	Academia	Junxuan Wang1 Xuyang Ge1 Wentao Shu1 Qiong Tang1 Yunhua Zhou1 Zhengfu He1,2 Xipeng Qiu1,2, 1Open MOSS Team, School of Computer Science, Fudan University 2Shanghai Innovation Institute EMAIL EMAIL
Pseudocode	No	The paper includes mathematical frameworks for Mamba circuits (Section 3.2) and Sparse Autoencoders (Section 3.3) but does not contain a dedicated pseudocode block or algorithm section labeled explicitly as such, nor structured steps formatted like code.
Open Source Code	No	The paper mentions "1https://huggingface.co/state-spaces/mamba-130m" which is a reference to an open-source Mamba model used by the authors, but it is a third-party resource, not their own implementation code for the methodology described in the paper. There is no explicit statement or link indicating that the authors have released their own source code.
Open Datasets	Yes	We choose to study Pythia-160M (Biderman et al., 2023) and an open-source version of Mamba130M1. ... They adopt the same tokenizer and both are trained on the Pile dataset (Gao et al., 2021). ... We iterate over 1 million tokens sampled from Slim Pajama ... We validated our approach on three distinct datasets: Slim Pajama (baseline), Red Pajama-Github (specialized code corpus), Open Web Text (general web text)
Dataset Splits	No	The paper mentions using "1 million tokens sampled from Slim Pajama" and that models were "trained on the Pile dataset", and refers to "each document to 1024 tokens" for activation collection. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the experiments conducted in the paper.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using the "Adam optimizer" but does not specify version numbers for any programming languages, libraries, or software packages used in the implementation.
Experiment Setup	Yes	We set the hidden dimension of SAEs to F = 32D = 24576 for all Sparse Autoencoders (SAEs). We train SAEs with Adam optimizer with a β1 = 0.9, β2 = 0.999 of and ϵ = 10 8. The learning rate is set to to 8e-4 for all SAEs. L1 Regularization Strength We systematically evaluated the impact of sparsity constraints by training SAEs with three distinct L1 coefficients: 1 10 4, 2 10 4 (baseline used in main experiments), and 4 10 4.