reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Authors: Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover, Pinar Yanardag, Duen Horng Chau

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the efficacy of CONCEPTATTENTION in a zeroshot semantic segmentation task on real world images. We compare our interpretative maps against annotated segmentations to measure the accuracy and relevance of the attributions generated by our method. Our experiments and extensive comparisons demonstrate that CONCEPTATTENTION provides valuable insights... and CONCEPTATTENTION achieves state-of-the-art performance in zero-shot segmentation on benchmarks like Image Net Segmentation and Pascal VOC across multiple Di T architectures. We perform several ablation studies to investigate the impact of various architectural choices and hyperparameters on the performance of CONCEPTATTENTION.
Researcher Affiliation	Collaboration	1Georgia Tech 2Virginia Tech 3IBM Research. Correspondence to: Alec Helbling <EMAIL>.
Pseudocode	Yes	A. More In-depth Explanation of Concept Attention We show pseudo-code depicting the difference between a vanilla multi-modal attention mechanism and a multi-modal attention mechanism with concept attention added to it. Figure 9. Pseudo-code depicting the (a) multi-modal attention operation used by Flux Di Ts and (b) our CONCEPTATTENTION operation.
Open Source Code	Yes	Code: alechelbling.com/Concept Attention/
Open Datasets	Yes	This evaluation protocol centers around the Image Net-Segmentation dataset (Guillaumin et al., 2014), and we extend this evaluation to the Pascal VOC dataset (Everingham et al., 2015).
Dataset Splits	Yes	We investigate both a single class (930 images) and multi-class split (1,449 images) of this dataset.
Hardware Specification	No	No specific hardware details (GPU models, CPU models, etc.) used for running experiments are explicitly mentioned in the paper.
Software Dependencies	No	Flux Di T For most of our experiments we use the Flux Di T architecture implemented in Py Torch (Paszke et al., 2019).
Experiment Setup	Yes	In our experiments we leverage the activations from the last 10 of the 18 MMATTN layers. ... Throughout the rest of our experiments we use timestep 500 out of 1000 following this result.