Interpreting CLIP with Hierarchical Sparse Autoencoders

Authors: Vladimir Zaigrajew, Hubert Baniecki, Przemyslaw Biecek

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct extensive experiments to evaluate MSAE against Re LU and Top K SAEs. We compare the sparsity fidelity trade-off (Section 4.2), at multiple granularity levels (Section 4.3). We follow with evaluating the semantic quality of learned representations beyond traditional distance metrics (Section 4.4), analyzing decoder orthogonality (Section 4.5), and examining the statistical properties of SAE activation magnitudes (Section 4.6). To verify that MSAE successfully learns hierarchical features, we conduct experiments on the progressive recovery task (Section 4.7).
Researcher Affiliation Academia 1Warsaw University of Technology, Warsaw, Poland 2University of Warsaw, Warsaw, Poland. Correspondence to: Vladimir Zaigrajew <EMAIL>.
Pseudocode No The paper describes the model architecture using mathematical equations (e.g., Equation 1) and prose, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes We make the codebase available at https://github.com/WolodjaZ/MSAE.
Open Datasets Yes All SAE models are trained on the CC3M (Sharma et al., 2018) training set with features (post-pooled) from the CLIP Vi T-L/14 or Vi T-B/16 model. Image modality is evaluated on Image Net-1k training set (Russakovsky et al., 2015), while text modality is evaluated on the CC3M validation set.
Dataset Splits Yes All SAE models are trained on the CC3M (Sharma et al., 2018) training set with features (post-pooled) from the CLIP Vi T-L/14 or Vi T-B/16 model. Image modality is evaluated on Image Net-1k training set (Russakovsky et al., 2015), while text modality is evaluated on the CC3M validation set.
Hardware Specification Yes All models were trained for 30 epochs on a single NVIDIA A100 GPU with batch size 4096, except for the model with an expansion rate of 32, which was trained for 20 epochs.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Reduce LROn Plateau scheduler' but does not provide specific version numbers for any key software dependencies or libraries.
Experiment Setup Yes All models were trained for 30 epochs on a single NVIDIA A100 GPU with batch size 4096, except for the model with an expansion rate of 32, which was trained for 20 epochs. For Vi T-L/14, we explored parameters near RN50-optimal values to ensure cross-architecture consistency. With expansion factor 8 (768 → 6144), we explore: Learning rates per method: 1⋅10−5, 5⋅10−5, 1⋅10−4, 5⋅10−4, 1⋅10−3. Re LU L1 coefficients (λ): 1⋅10−4, 3⋅10−3, 1⋅10−3, 3⋅10−2. Top K values: k ∈ {32, 64, 128, 256}. Matryoshka K-lists: {32...6144} and {64...6144}. α coefficients: uniform weighting (UW) {1,1,1,1,1,1,1} and reverse weighting (RW) {7,6,5,4,3,2,1}.