Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

WeightedSHAP: analyzing and improving Shapley based feature attributions

Authors: Yongchan Kwon, James Y. Zou

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On several real-world datasets, we demonstrate that the influential features identified by Weighted SHAP are better able to recapitulate the model s predictions compared to the features identified by the Shapley value.
Researcher Affiliation Collaboration Yongchan Kwon Columbia University New York, NY, 10027 EMAIL James Zou Stanford University and Amazon AWS Stanford, CA, 94305 EMAIL
Pseudocode Yes We provide a pseudo algorithm in Appendix.
Open Source Code Yes All the missing details about numerical experiments are provided in Appendix, and our Python-based implementations are available at https://github.com/ykwon0407/Weighted SHAP.
Open Datasets No The paper names standard datasets like 'boston', 'airfoil', 'whitewine', 'abalone', 'fraud', 'phoneme', 'wind', 'cpu', and 'MNIST' but does not provide specific access links, DOIs, repositories, or explicit citations for these datasets in the main text.
Dataset Splits No The paper mentions evaluating on 'held-out test samples' but does not explicitly provide specific train/validation/test dataset splits, percentages, or cross-validation details for reproducing the data partitioning.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] It is not the main focus of our paper.
Software Dependencies No The paper mentions 'Python-based implementations' and the use of a 'gradient boosting model' and a 'multilayer perceptron model', but does not provide specific version numbers for Python or any key software libraries/dependencies.
Experiment Setup Yes As for the surrogate model in coalition function estimation v(cond), we use a multilayer perceptron model with the two hidden layers, and each layer has 128 neurons and the ELU activation function [Clevert et al., 2015].