Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Authors: Hyesu Lim, Jinho Choi, Jaegul Choo, Steffen Schneider

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. ... We use our Patch SAE to shed light on the internal mechanisms of foundation models during adaptation tasks. ... Through extensive analysis, we reveal a wide range of interpretable concepts of CLIP... Furthermore, we demonstrate that the SAE latents have a crucial impact on the model prediction in classification tasks through ablation studies.
Researcher Affiliation Academia Hyesu Lim1,2 Jinho Choi2 Jaegul Choo2 Steffen Schneider1,3 1Institute of Computational Biology, Computational Health Center, Helmholtz Munich 2KAIST AI 3Munich Center for Machine Learning (MCML) Correspondence: EMAIL
Pseudocode No The paper describes the Patch SAE architecture and training objectives using formal equations (Eq. 1 and Eq. 2) and textual descriptions in Section 3.1, but it does not contain a distinct pseudocode or algorithm block.
Open Source Code Yes Code and Demo: github.com/dynamical-inference/patchsae
Open Datasets Yes We train our Patch SAE on a frozen CLIP Vi T with an MSE loss and an L1 sparsity regularizer using Image Net (IN) ( 3.1). ...We use total 11 benchmark datasets ( Reproducibility): Image Net-1K (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVC Aircraft (Maji et al., 2013), SUN397 (Xiao et al., 2010), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), and UCF101 (Soomro, 2012).
Dataset Splits Yes Following the setup introduced by Zhou et al. (2022b), we split the downstream task dataset classes into two groups and consider the first half as base and the remaining as novel classes, then conduct classification on two groups separately. In the base-to-novel setting, Ma PLe uses few-shot samples from each of the base classes to train the learnable tokens.
Hardware Specification No The paper mentions support from the 'Helmholtz Association s Initiative and Networking Fund on the HAICORE@KIT partition,' which refers to a computing resource but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using specific model checkpoints for CLIP Vi T-B/16 and Ma PLe weights, but it does not provide specific software dependencies with version numbers such as programming languages (e.g., Python), libraries (e.g., PyTorch), or CUDA versions.
Experiment Setup Yes Training details. We set the coefficient for L1 regularizer λl1 as 8e-5, the learning rate as 4e4 with a constant warmup scheduling with warmup step of 500, and initialized decoder bias with geometric median. We train SAE using 2,621,440 samples from Image Net training dataset using ghost gradient. We set the threshold τ, that we use for transforming patch-level activations into global views (Eq. 4), to 0.2 (log 10 value of -0.7).