reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

Authors: Frederik Pahde, Maximilian Dreyer, Moritz Weckbecker, Leander Weber, Christopher J. Anders, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate various CAV methods in terms of their alignment with the true concept direction and their impact on CAV applications, including concept sensitivity testing and model correction for shortcut behavior caused by data artifacts. We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and Funny Birds datasets with VGG, Res Net, Re XNet, Efficient Net, and Vision Transformer as model architectures.
Researcher Affiliation	Collaboration	1Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute 2Department of Electrical Engineering and Computer Science, Technische Universit at Berlin 3BIFOLD Berlin Institute for the Foundations of Learning and Data
Pseudocode	No	The paper uses mathematical equations to describe the optimization tasks (Eq. 1, 2, 3) and does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/frederikpahde/pattern-cav
Open Datasets	Yes	We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and Funny Birds datasets with VGG, Res Net, Re XNet, Efficient Net, and Vision Transformer as model architectures.1. ... Specifically, we insert artificial concepts into ISIC2019 (Codella et al., 2018; Tschandl et al., 2018; Combalia et al., 2019)... and a Pediatric Bone Age dataset (Halabi et al., 2019)... Lastly, we use Funny Birds (Hesse et al., 2023)...
Dataset Splits	Yes	Details for our controlled Clever Hans datasets... train / val / test split: 80%/10%/10%. ... We synthesize 500 training samples and 100 test samples per class, totaling to 5000 training and 1000 test samples. The training set is further split into training/validation splits (90%/10%).
Hardware Specification	Yes	We ran all model training and correction jobs on GPUs of type NVIDIA Ampere A100 with 40 GB RAM.
Software Dependencies	No	The paper mentions 'timm' and 'torchvision' as sources for pre-trained models and 'zennit' for attribution heatmaps, but does not provide specific version numbers for these software libraries or other key dependencies like Python or PyTorch.
Experiment Setup	Yes	Table 4: Model training details including the pre-trained checkpoint, optimizer, learning Rate (LR), number of epochs, and milestones, after which the learning rate is divided by 10. ... Model correction is performed with RR-Cl Ar C for 10 epochs with the initial training learning rate (see Table 4) divided by 10. To balance between classification loss and the added loss term LRR, we weigh the latter term with λ {105, 106, ..., 1010}.