reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

WeightedSHAP: analyzing and improving Shapley based feature attributions

Authors: Yongchan Kwon, James Y. Zou

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On several real-world datasets, we demonstrate that the influential features identified by Weighted SHAP are better able to recapitulate the model s predictions compared to the features identified by the Shapley value.
Researcher Affiliation	Collaboration	Yongchan Kwon Columbia University New York, NY, 10027 EMAIL James Zou Stanford University and Amazon AWS Stanford, CA, 94305 EMAIL
Pseudocode	Yes	We provide a pseudo algorithm in Appendix.
Open Source Code	Yes	All the missing details about numerical experiments are provided in Appendix, and our Python-based implementations are available at https://github.com/ykwon0407/Weighted SHAP.
Open Datasets	No	The paper names standard datasets like 'boston', 'airfoil', 'whitewine', 'abalone', 'fraud', 'phoneme', 'wind', 'cpu', and 'MNIST' but does not provide specific access links, DOIs, repositories, or explicit citations for these datasets in the main text.
Dataset Splits	No	The paper mentions evaluating on 'held-out test samples' but does not explicitly provide specific train/validation/test dataset splits, percentages, or cross-validation details for reproducing the data partitioning.
Hardware Specification	No	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] It is not the main focus of our paper.
Software Dependencies	No	The paper mentions 'Python-based implementations' and the use of a 'gradient boosting model' and a 'multilayer perceptron model', but does not provide specific version numbers for Python or any key software libraries/dependencies.
Experiment Setup	Yes	As for the surrogate model in coalition function estimation v(cond), we use a multilayer perceptron model with the two hidden layers, and each layer has 128 neurons and the ELU activation function [Clevert et al., 2015].