reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Performance of Explainable AI Models with Constrained Concept Refinement

Authors: Geyu Liang, Senne Michielssen, Salar Fattahi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, we evaluate the practical performance of our proposed framework in generating explainable predictions for image classification tasks across various benchmarks. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost. Empirical evaluation. We conduct experiments on multiple benchmark datasets for image classification tasks to assess the practical effectiveness of our approach (Section 4).
Researcher Affiliation	Academia	1Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, US. 2Department of Computer Science, Princeton University, Princeton, NJ, US. Correspondence to: Salar Fattahi <EMAIL>.
Pseudocode	Yes	Our meta-algorithm for Problem (3), called constrained concept refinement (CCR), is presented in Algorithm 1. Algorithm 1 Constrained Concept Refinement... We formally introduce our algorithm in Algorithm 2. Algorithm 2 CCR for Interpretable Image Classification... In Algorithm 3, we present the pseudo-code for concept dispersion. Algorithm 3 Concept dispersion... Algorithm 4 presents the pseudo-code for the projection and normalization step in Algorithm 2. Algorithm 4 Embedding normalization and projection.
Open Source Code	Yes	The Python implementation of algorithm can be found here: github.com/lianggeyuleo/CCR.git.
Open Datasets	Yes	We demonstrate the practical efficacy of CCR on multiple image classification benchmarks including CIFAR 10/100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), CUB200 (Wah et al., 2011) and Places365 (Zhou et al., 2017).
Dataset Splits	Yes	The evaluation is conducted across five image classification benchmarks: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), CUB-200 (Wah et al., 2011), and Places365 (Zhou et al., 2017). For the CIFAR-10/100 and CUB-200 datasets, we tune CLIP-IP-OMP to match the average sparsity level of s, also referred to as the explanation length or k, used in CCR. For Image Net and Places365, we report the best accuracy achieved by CLIP-IP-OMP across all explanation lengths.
Hardware Specification	Yes	In our computational environment, using an NVIDIA Tesla V100 GPU, CLIP-IP-OMP remains comparably expensive, requiring 33 hours for k = 50. All experiments reported in this section were performed in Python 3.9 on a Mac Book Pro (14-inch, 2021) equipped with an Apple M1 Pro chip.
Software Dependencies	No	All experiments reported in this section were performed in Python 3.9 on a Mac Book Pro (14-inch, 2021) equipped with an Apple M1 Pro chip. (Only a programming language version is provided, not specific libraries or solvers with versions.)
Experiment Setup	Yes	The constraint parameter ρ for CCR is fixed at 0.1 for all experiments. For the results shown in Figure 4, we set d = 10, k = 5, ρ = 0.2, γ = 0.5, and Γ = 1. The first column of Figure 4 illustrates the scenario corresponding to Theorem 3.3, in which only a single input feature x is available. Here, we choose n = 8 and η = 10 2. In the second column of Figure 4, we apply projected gradient descent to minimize Lm, as defined in Equation (6), under the assumption that D is rank-deficient. Specifically, we set n = 8 < d and η = 10 1.