reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pixel-level Certified Explanations via Randomized Smoothing

Authors: Alaa Anani, Tobias Lorenz, Mario Fritz, Bernt Schiele

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive evaluation of 12 attribution methods across 5 Image Net models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks. Our experiments show significant differences between methods in all four dimensions, with LRP and RISE demonstrating the best trade-offs between certified robustness, localization, faithfulness, and meaningful visual attributions.
Researcher Affiliation	Academia	1Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr ucken, Germany 2CISPA Helmhotz Center for Information Security, Saarbr ucken, Germany. Correspondence to: Alaa Anani <EMAIL>.
Pseudocode	No	The paper describes methods and processes mathematically and textually, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Our code is at https://github.com/Alaa Anani/ certified-attributions.
Open Datasets	Yes	We use the Image Net (Russakovsky et al., 2015) dataset and 5 classification models: Res Net-18 (He et al., 2016), Wide Res Net-50-2 (Zagoruyko & Komodakis, 2016), Res Net-152 (He et al., 2016), VGG-11 (Karen, 2015), and Vi T-B/16 (Dosovitskiy et al., 2020).
Dataset Splits	Yes	We select images that were classified with a high confidence by Res Net-18 and VGG-11, ensuring the top K% pixels constitute positive evidence for the correct class. Due to the computational cost of sampling during certification, we randomly select 100 images from the validation set, on which we compute all results. The image dimension is 224 224. ... We create 2 2 grids of images from distinct classes, sampled from the confidence-filtered validation set. For each attribution method, we generate 100 such grids to compute Grid PG and Certified Grid PG scores.
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or detailed computer specifications used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	Certification We use noise level σ = 0.15, number of samples n = 100 per image, top class probability τ = 0.75 and α = 0.001, unless stated otherwise. All certified results are robust with confidence 1 α w.r.t the radius R = 0.10, unless stated otherwise. ... Attribution methods LRP (Bach et al., 2015) Following (Montavon et al., 2017; Arias-Duart et al., 2021; Rao et al., 2022), we use the configuration that applies the ϵ-rule to the fully connected layers in the network, with ϵ = 0.25, and the z+-rule to the convolutional layers, except for the first convolutional layer where we use the z B-rule. Occlusion (Zeiler & Fergus, 2014) Occlusion uses a sliding window of size K and stride s over the input. Following (Rao et al., 2022), we use K = 16, s = 8 for the input layer and K = 5, s = 2 for the final layer. RISE (Petsiuk et al., 2018b) We use a total of N = 6000 randomly generated masks per layer. Masks are initially generated on a low-resolution grid of size s s (with default s = 6), where each cell is activated with probability 0.1. Each binary grid is bilinearly upsampled and cropped to match the input dimensions.