Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Authors: Junxuan Wang, Xuyang Ge, Wentao Shu, Qiong Tang, Yunhua Zhou, Zhengfu He, Xipeng Qiu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Qualitative and quantitative experiments show that these two family models learn a lot of similar features. We also utilize our SAEs to investigate induction circuits (Olsson et al., 2022) in Mamba, which are also highly analogous to ones in Transformers. We include experimental results for the 2.8B models of both families in Appendix D.5 to demonstrate the generalizability of our findings. Figure 3a presents the results of all MPPC experiments on SAE features, namely random baseline, Pythia-Mamba SAE similarity (main experiment), model seed skyline and SAE seed skyline. Figure 3b presents the results of random baseline and main experiment on neurons.
Researcher Affiliation Academia Junxuan Wang1 Xuyang Ge1 Wentao Shu1 Qiong Tang1 Yunhua Zhou1 Zhengfu He1,2 Xipeng Qiu1,2, 1Open MOSS Team, School of Computer Science, Fudan University 2Shanghai Innovation Institute EMAIL EMAIL
Pseudocode No The paper includes mathematical frameworks for Mamba circuits (Section 3.2) and Sparse Autoencoders (Section 3.3) but does not contain a dedicated pseudocode block or algorithm section labeled explicitly as such, nor structured steps formatted like code.
Open Source Code No The paper mentions "1https://huggingface.co/state-spaces/mamba-130m" which is a reference to an open-source Mamba model used by the authors, but it is a third-party resource, not their own implementation code for the methodology described in the paper. There is no explicit statement or link indicating that the authors have released their own source code.
Open Datasets Yes We choose to study Pythia-160M (Biderman et al., 2023) and an open-source version of Mamba130M1. ... They adopt the same tokenizer and both are trained on the Pile dataset (Gao et al., 2021). ... We iterate over 1 million tokens sampled from Slim Pajama ... We validated our approach on three distinct datasets: Slim Pajama (baseline), Red Pajama-Github (specialized code corpus), Open Web Text (general web text)
Dataset Splits No The paper mentions using "1 million tokens sampled from Slim Pajama" and that models were "trained on the Pile dataset", and refers to "each document to 1024 tokens" for activation collection. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the experiments conducted in the paper.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the "Adam optimizer" but does not specify version numbers for any programming languages, libraries, or software packages used in the implementation.
Experiment Setup Yes We set the hidden dimension of SAEs to F = 32D = 24576 for all Sparse Autoencoders (SAEs). We train SAEs with Adam optimizer with a β1 = 0.9, β2 = 0.999 of and ϵ = 10 8. The learning rate is set to to 8e-4 for all SAEs. L1 Regularization Strength We systematically evaluated the impact of sparsity constraints by training SAEs with three distinct L1 coefficients: 1 10 4, 2 10 4 (baseline used in main experiments), and 4 10 4.