reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection

Authors: Moritz Vandenhirtz, Julia E Vogt

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically on semi-synthetic and natural image datasets that our inherently interpretable classifier produces more meaningful, human-understandable predictions than state-of-the-art benchmarks. Main Contributions This work contributes to the line of research on instance-wise feature selection in multiple ways. (i) We propose a novel, semantic region-based approach to sparsify input images for inherently interpretable predictions. (ii) We propose a dynamic thresholding that adjusts the sparsity depending on the required amount of information. (iii) We conduct a thorough empirical assessment on semi-synthetic and natural image datasets. In particular, we show that P2P (a) retains the predictive performance of black-box models, (b) identifies instance-specific relevant regions along with their relationships, and (c) faithfully leverages these regions to perform its prediction.
Researcher Affiliation	Academia	Moritz Vandenhirtz 1 Julia E. Vogt 1 1Department of Computer Science, ETH Zurich, Switzerland. Correspondence to: Moritz Vandenhirtz <EMAIL>.
Pseudocode	No	The paper describes the method using mathematical formulations, textual descriptions, and a schematic overview (Figure 2), but does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	2The code is here: www.github.com/mvandenhi/P2P
Open Datasets	Yes	We evaluate performance and faithfulness on the natural image datasets CIFAR-10 (Krizhevsky et al., 2009) with 10 classes, and Image Net (Russakovsky et al., 2015) with 1000 classes. To additionally compute localization, we use Imagenet-9 (Xiao et al., 2021), a subset of Image Net with 9 coarse-grained classes where object segmentations have been made available. Furthermore, we introduce COCO-10, a subset of MS COCO (Lin et al., 2014), with the classes {Bed, Car, Cat, Clock, Dog, Person, Sink, Train, TV, Vase}. Lastly, we use the semi-synthetic BAM from Yang & Kim (2019) whose results are shown in Appendix B to save space, and they support the same conclusions.
Dataset Splits	Yes	Thus, we obtain 1000 training and 100 test images per class and use the object segmentations of MS COCO. We train for 20 epochs on Image Net and 100 epochs on all other datasets with a batch size of 64.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments. It mentions using pre-trained models like Mobile Net V3 and Vi T-Tiny but not the hardware they were trained or evaluated on.
Software Dependencies	No	The paper mentions using Adam optimizer and Fast SLIC for superpixels, but it does not provide specific version numbers for these or any other software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	All models are trained using Adam (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999, and learning rate 1e-4. We train for 20 epochs on Image Net and 100 epochs on all other datasets with a batch size of 64. To get superpixels, we use Fast SLIC (Kim, 2021) with m = 20 and 100 segments, determined by visual inspection. As described in Section 3, any large λ1 leads to p τ, so we set it to 10, and λ2 = 0.01 against overfitting. At inference, we set the certainty threshold δ to 0.8 for Image Net and 0.99 for all other datasets, and determine active regions by thresholding probabilities at 0.5 instead of sampling.