Revisiting Discover-then-Name Concept Bottleneck Models: A Reproducibility Study

Authors: Freek Byrman, Emma Kasteleyn, Bart Kuipers, Daniel Uyterlinde

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This study focuses on replicating the key findings reported by Rao et al. (2024). We conclude that the core conceptual ideas are reproducible, but not to the extent presented in the original work. Many representations of active neurons appear to be misaligned with their assigned concepts, indicating a lack of faithfulness of the DN-CBM s explanations. To address this, we propose a model extension: an enhanced alignment method that we evaluate through a user study.
Researcher Affiliation Academia Freek Byrman EMAIL University of Amsterdam Emma Kasteleyn EMAIL University of Amsterdam Bart Kuipers EMAIL University of Amsterdam Daniel Uyterlinde EMAIL University of Amsterdam
Pseudocode No The paper uses mathematical equations to describe models and loss functions (e.g., Equation 1 for LSAE, Equation 7 for LFSAE) and diagrams like Figure 1 for conceptual overview, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation of our work is available in our Git Hub repository.
Open Datasets Yes CC3M is used for training and evaluating the SAE (Dextract), whereas Image Net, Places365*, CIFAR10, CIFAR100, and Waterbirds-100 are used for training and evaluating the linear probe (Dprobe): CC3M. CC3M consists of image-caption pairs generated by extracting text from alt-texts of images on the web (Sharma et al., 2018). Due to link rot, approximately 68% of the originally collected images were available in the final dataset. The dataset can be downloaded here. Image Net. Image Net-1K is a standard benchmark for image classification (Deng et al., 2009). Places365. We use a 10% subset of the Places365 dataset (Zhou et al., 2017), sampled to preserve the original class distribution, for downstream classification. We refer to this subset as Places365*. CIFAR100. CIFAR100 contains 60,000 images evenly distributed across 100 classes (Krizhevsky, 2009). Waterbirds-100. The Waterbirds-100 dataset (Petryk et al., 2022; Sagawa et al., 2020) features landbirds and waterbirds with spurious background correlations during training, but not in the test set.
Dataset Splits Yes Places365. We use a 10% subset of the Places365 dataset (Zhou et al., 2017), sampled to preserve the original class distribution, for downstream classification. We refer to this subset as Places365*. Waterbirds-100. The Waterbirds-100 dataset (Petryk et al., 2022; Sagawa et al., 2020) features landbirds and waterbirds with spurious background correlations during training, but not in the test set. To reproduce C3, we follow the original DN-CBM model specification by attaching a linear probe to the SAE to classify images. We compute the classification accuracy on Image Net, Places365*, CIFAR10, and CIFAR100, specified in Section 3.2.
Hardware Specification Yes The computational tasks were carried out using GPU resources provided by the Dutch national supercomputer, Snellius, with funding support from the University of Amsterdam. An Nvidia A100 Tensor Core GPU was used to run the experiments.
Software Dependencies No The paper mentions 'Adam optimizer' and 'CLIP Res Net-50' but does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup Yes We reproduce the experiments using the hyperparameters from Rao et al. (2024), as reported in Table 1, with v2 as the default for standard classification and v3 for concept interventions, unless otherwise specified. Table 1: Hyperparameters. We use the same hyperparameters as Rao et al. (2024), though the original paper describes a hyperparameter sweep without detailing the probe settings. For standard classification, we evaluate two linear probe configurations: v1, based on the Git Hub README example for Places365, and v2, the default settings provided in the code for each probe dataset. A third configuration, v3, is used specifically for concept intervention experiments. The Adam optimizer is applied with its default hyperparameter settings (Kingma and Ba, 2015). General SAE Probe Hyperparameter Value Hyperparameter Value Hyperparameter v1 v2 v3 text encoder (T ) CLIP Res Net-50 latent dim (h) 8192 learning rate1 10 2 10 3 10 1 image encoder (I) Res Net-50 L1 sparsity (λ1) 3 10 5 batch size 512 512 512 vocabulary (V) Google 20k learning rate 0.1 epochs 200 200 200 vocabulary size (|V|) 20000 epochs 200 L1 sparsity (λ2) 0.1 1 10 embedding dim (d) 1024 batch size 4096 optimizer Adam Adam Adam batch resample freq 10 top-k pruning 5 optimizer Adam