reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods. Our code is provided at https://github.com/sandylaker/soco.git. Through extensive validation and benchmarking, we verify the correctness of the proposed metrics and showcase our metrics potential to shed light on existing attribution methods.
Researcher Affiliation	Academia	Yawei Li* EMAIL LMU Munich Munich Center for Machine Learning Yang Zhang* EMAIL National University of Singapore Kenji Kawaguchi EMAIL National University of Singapore Ashkan Khakzar EMAIL University of Oxford Bernd Bischl EMAIL LMU Munich Munich Center for Machine Learning Mina Rezaei EMAIL LMU Munich Munich Center for Machine Learning
Pseudocode	Yes	Algorithm 1 Soundness evaluation at predictive level v, Algorithm 2 Completeness evaluation at attribution threshold t, Algorithm 3 Soundness evaluation with accuracy (sm) as performance indicator, Algorithm 4 Completeness Evaluation
Open Source Code	Yes	Our code is provided at https://github.com/sandylaker/soco.git.
Open Datasets	Yes	We perturb 70% of pixels in each image in the CIFAR-10 (Krizhevsky et al., 2009a) training and test datasets. create two Semi-natural Dataset D(1) S and D(2) S from CIFAR-100 (Krizhevsky et al., 2009b). employ a VGG16 (Simonyan & Zisserman, 2015) pre-trained on Image Net (Deng et al., 2009) and conduct feature attribution on the Image Net validation set.
Dataset Splits	Yes	We perturb 70% of pixels in each image in the CIFAR-10 (Krizhevsky et al., 2009a) training and test datasets. employ a VGG16 (Simonyan & Zisserman, 2015) pre-trained on Image Net (Deng et al., 2009) and conduct feature attribution on the Image Net validation set.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used, such as GPU or CPU models. It only mentions training details for the model.
Software Dependencies	No	We use the implementations of Grad CAM, Deep SHAP, IG, IG ensembles in Captum (Kokhlikyan et al., 2020). This mentions a software library but does not provide specific version numbers for it or any other key software components used.
Experiment Setup	Yes	Training is conducted using Adam (Kingma & Ba, 2015) optimizer with a learning rate of 0.001 and weight decay of 0.0001. The batch size used for the training is 256, and we train a model in 35 epochs.