A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods. Our code is provided at https://github.com/sandylaker/soco.git. Through extensive validation and benchmarking, we verify the correctness of the proposed metrics and showcase our metrics potential to shed light on existing attribution methods.
Researcher Affiliation Academia Yawei Li* EMAIL LMU Munich Munich Center for Machine Learning Yang Zhang* EMAIL National University of Singapore Kenji Kawaguchi EMAIL National University of Singapore Ashkan Khakzar EMAIL University of Oxford Bernd Bischl EMAIL LMU Munich Munich Center for Machine Learning Mina Rezaei EMAIL LMU Munich Munich Center for Machine Learning
Pseudocode Yes Algorithm 1 Soundness evaluation at predictive level v, Algorithm 2 Completeness evaluation at attribution threshold t, Algorithm 3 Soundness evaluation with accuracy (sm) as performance indicator, Algorithm 4 Completeness Evaluation
Open Source Code Yes Our code is provided at https://github.com/sandylaker/soco.git.
Open Datasets Yes We perturb 70% of pixels in each image in the CIFAR-10 (Krizhevsky et al., 2009a) training and test datasets. create two Semi-natural Dataset D(1) S and D(2) S from CIFAR-100 (Krizhevsky et al., 2009b). employ a VGG16 (Simonyan & Zisserman, 2015) pre-trained on Image Net (Deng et al., 2009) and conduct feature attribution on the Image Net validation set.
Dataset Splits Yes We perturb 70% of pixels in each image in the CIFAR-10 (Krizhevsky et al., 2009a) training and test datasets. employ a VGG16 (Simonyan & Zisserman, 2015) pre-trained on Image Net (Deng et al., 2009) and conduct feature attribution on the Image Net validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware used, such as GPU or CPU models. It only mentions training details for the model.
Software Dependencies No We use the implementations of Grad CAM, Deep SHAP, IG, IG ensembles in Captum (Kokhlikyan et al., 2020). This mentions a software library but does not provide specific version numbers for it or any other key software components used.
Experiment Setup Yes Training is conducted using Adam (Kingma & Ba, 2015) optimizer with a learning rate of 0.001 and weight decay of 0.0001. The batch size used for the training is 256, and we train a model in 35 epochs.