reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Residual Stream Analysis with Multi-Layer SAEs

Authors: Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that multi-layer SAEs achieve comparable reconstruction error and downstream loss to single-layer SAEs while allowing us to directly identify and analyze features that are active at multiple layers (Section 4.1). When aggregating over a large sample of tokens, we find that individual latents are likely to be active at multiple layers, and this measure increases with the number of latents. However, for a single token, latent activations are more likely to be isolated to a single layer. For larger underlying transformers, we show that the residual stream activation vectors at adjacent layers are more similar and that the degree to which latents are active at multiple layers increases.
Researcher Affiliation	Academia	Tim Lawson Lucy Farnik Conor Houghton Laurence Aitchison School of Engineering Mathematics and Technology University of Bristol Bristol, UK
Pseudocode	No	The paper provides mathematical equations describing the encoder, decoder, loss functions, and tuned lens transformations. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps for a method or procedure.
Open Source Code	Yes	We release our code to train and analyze MLSAEs at https://github.com/tim-lawson/mlsae.
Open Datasets	Yes	We train MLSAEs primarily on GPT-style language models from the Pythia suite (Biderman et al., 2023)... We train each autoencoder on 1 billion tokens from the Pile (Gao et al., 2020), excluding the copyrighted Books3 dataset
Dataset Splits	No	We train each autoencoder on 1 billion tokens from the Pile (Gao et al., 2020)... for a single epoch... We use an effective batch size of 131072 tokens (64 sequences) for all experiments... We report the values of these metrics over one million tokens from the test set.
Hardware Specification	Yes	We trained most MLSAEs on a single NVIDIA Ge Force RTX 3090 GPU for between 12 and 24 hours; we trained the largest MLSAEs (e.g., with Pythia-1b or an expansion factor of R = 256) on a single NVIDIA A100 80GB GPU for up to three days.
Software Dependencies	No	The paper mentions using the Adam optimizer (Kingma & Ba, 2017) and refers to existing implementations (Gao et al., 2023; Belrose, 2024). However, it does not specify version numbers for any software libraries, programming languages, or other dependencies necessary to replicate the experiment environment.
Experiment Setup	Yes	Our hyperparameters are the expansion factor R = n/d, the ratio of the number of latents to the model dimension, and the sparsity k, the number of largest latents to keep in the Top K activation function. We choose expansion factors as powers of 2 between 1 and 256... and k as powers of 2 between 16 and 512... Following Gao et al. (2024), we choose kaux as a power of 2 close to d/2 and α = 1/32... We use the Adam optimizer (Kingma & Ba, 2017) with the default β parameters, a constant learning rate of 1 × 10−4, and ϵ = 6.25 × 10−10. We use an effective batch size of 131072 tokens (64 sequences) for all experiments.