reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Authors: Harrish Thasarathan, Julian Forsyth, Thomas Fel, Matthew Kowal, Konstantinos G. Derpanis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through qualitative and quantitative evaluations, we show that the resulting concept space captures interpretable features shared across all models. ... This section is split into six parts. We first provide experimental implementation details. Then, we qualitatively analyze universal concepts discovered by USAEs (Sec. 4.1). Next, we provide a quantitative analysis of USAEs through the validation of activation reconstruction (Sec. 4.2), measuring the universality and importance of concepts (Secs. 4.3), and investigating the consistency between concepts in USAEs and individually trained SAE counterparts (Sec. 4.4). Finally, we provide a finer-grained analysis via the application of USAEs to coordinated activation maximization (Sec. 4.5).
Researcher Affiliation	Collaboration	1York University, Toronto, Canada 2Vector Institute, Toronto, Canada 3Kempner Institute, Harvard University, Boston, USA 4FAR.AI 5Trajectory Labs, Toronto 6University of Toronto, Toronto, Canada 7Samsung AI Centre, Toronto.
Pseudocode	Yes	def train_usae(Ψθ, D, A, T, Optimizers): M = len(Ψθ) for t in range(T): i = random(M) Z = Ψ(i) θ (A(i)) L = 0.0 for j in range(M): b A(j) = Z @ D(j) L += (A(j) b A(j)).norm(p= fro ) L.backward() Optimizers[i].step() return Ψθ, D Figure 3. Training Universal Sparse Autoencoder.
Open Source Code	Yes	Code: github.com/YorkUCVIL/UniversalSAE.
Open Datasets	Yes	We train a USAE on the final layer activations of three popular vision models: Dino V2 (Oquab et al., 2023; Darcet et al., 2024), Sig LIP (Zhai et al., 2023), and Vi T (Dosovitskiy et al., 2020) (trained on Image Net (Deng et al., 2009)). ... We use DTD (Cimpoi et al., 2014) and Celeb A (Liu et al., 2015) as the validation dataset...
Dataset Splits	Yes	For all experiments, we train the USAE on the Image Net training set, while the validation set is reserved for qualitative visualizations and quantitative evaluations.
Hardware Specification	Yes	We train all USAEs on a single Nvidia RTX 6000 GPU, with training completing in approximately three days (see Appendix A.1 for more implementation details).
Software Dependencies	No	The models were sourced from the timm library (Wightman, 2019). All SAE encoder-decoder pairs have independent Adam optimizers (Kingma & Ba, 2015). The encoder consists of a single linear layer followed by batch normalization (Ioffe & Szegedy, 2015).
Experiment Setup	Yes	For all experiments, we use a dictionary of size 6144. All SAE encoder-decoder pairs have independent Adam optimizers (Kingma & Ba, 2015), each with an initial learning rate of 3e 4, which decays to 1e 6 following a cosine schedule with linear warmup. To account for variations in activation scales caused by architectural differences, we standardize each model s activations using 1000 random samples from the training set. Since Sig LIP does not incorporate a class token, we remove class tokens from Dino V2 and Vi T to ensure consistency across models. Additionally, we interpolate the Dino V2 token count to match a patch size of 16 16 pixels, aligning it with Sig LIP and Vi T.