reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Attribution-based Explanations that Provide Recourse Cannot be Robust

Authors: Hidde Fokkema, Rianne de Heide, Tim van Erven

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further illustrate our main impossibility result with experiments and analytical examples that show cases in which the well-known attribution methods Smooth Grad (Smilkov et al., 2017), Integrated Gradients (Sundararajan et al., 2017), LIME (Ribeiro et al., 2016) and SHAP (Lundberg and Lee, 2017) fail to be recourse sensitive. We also provide an analytical example in which counterfactual explanations fail to be continuous. We then reﬂect on our impossibility result in Section 4, and discuss possible ways around it.
Researcher Affiliation	Academia	Hidde Fokkema EMAIL Korteweg-de Vries Institute for Mathematics University of Amsterdam Science Park 107, 1098 XG Amsterdam, The Netherlands Rianne de Heide EMAIL Department of Mathematics Vrije Universiteit Amsterdam De Boelelaan 1111, 1081 HV Amsterdam, The Netherlands Tim van Erven EMAIL Korteweg-de Vries Institute for Mathematics University of Amsterdam Science Park 107, 1098 XG Amsterdam, The Netherlands
Pseudocode	No	The paper describes methods mathematically and provides analytical examples, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All the code to reproduce the experiments and ﬁgures in this paper can be found in a Git Hub repository4. github.com/Hidde Fok/recourse-robust-explanations-impossible
Open Datasets	No	A total of 53 gray scale ﬁgures were created from the User Icon picture, found on www.iconarch ive.com. Each ﬁgure consists of two components, the person and a background. The ﬁgures have varying contrasts between these two components. We labeled each ﬁgure by hand according to this contrast.
Dataset Splits	No	The paper describes the creation and labeling of a custom 'Proﬁle Picture Toy Dataset' and how a threshold parameter was chosen for a perfect classifier, but it does not specify any training, validation, or test dataset splits.
Hardware Specification	Yes	All experiments were run locally on an Apple Mac Book Pro M1 13", 2020 with 8GB of RAM.
Software Dependencies	Yes	For LIME we used version 0.2.0.1 and for SHAP version 0.40.0. ... Finally, for some of the picture manipulation we used the scikit-image (van der Walt et al., 2011) package, version 1.0, under the BSD 3-Clause License7.
Experiment Setup	Yes	The classiﬁcation function is given by ... The attribution methods based on gradients were calculated analytically. The attributions for the Vanilla Gradients, Smooth Grad and Integrated Gradients are given by ... where we have chosen x0 = 0 as the baseline. ... By increasing the threshold from the minimum value of all quadratic differences to the maximum value, the parameter with the highest accuracy was chosen. This lead to the choice λthres = 5961.34, which achieved perfect accuracy across both classes.