reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Shapley Value for Negative Interactions in Non-convex Games

Authors: Wonjoon Chang, Myeongjin Lee, Jaesik Choi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are conducted on image classifiers (VGG19 (Simonyan & Zisserman, 2014), Res Net50 (He et al., 2016)) trained on the Image Net dataset (Deng et al., 2009), and a sentence classifier (BERT (Devlin, 2018)) trained on the IMDB Review dataset (Maas et al., 2011).
Researcher Affiliation	Academia	Wonjoon Chang, Myeongjin Lee Korea Advanced Institute of Science and Technology (KAIST) EMAIL Jaesik Choi Korea Advanced Institute of Science and Technology (KAIST), INEEJI EMAIL
Pseudocode	Yes	Algorithm 1 Approximation for Aggregated Positive Interactions
Open Source Code	No	No explicit statement about open-source code release or repository link is provided in the paper.
Open Datasets	Yes	Our experiments are conducted on image classifiers (VGG19 (Simonyan & Zisserman, 2014), Res Net50 (He et al., 2016)) trained on the Image Net dataset (Deng et al., 2009), and a sentence classifier (BERT (Devlin, 2018)) trained on the IMDB Review dataset (Maas et al., 2011).
Dataset Splits	No	The paper mentions using 'Image Net dataset' and 'IMDB Review dataset' but does not explicitly provide training/test/validation dataset splits or refer to specific standard splits used for reproduction.
Hardware Specification	Yes	We observed that stable results can be achieved with as few as 30 permutations, which took around 3 minutes on an RTX 6000 GPU.
Software Dependencies	No	The paper mentions models and methods like VGG19, ResNet50, BERT, Kernel SHAP, and Integrated Gradients, but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries).
Experiment Setup	Yes	We samples 100 permutations for the image classifiers and 300 permutations for the sentence classifier. In the case of Image Net data, we convert images into 20 20 patches for feasible computation. For Kernel SHAP (Lundberg, 2017), we use 40,000 samples, as our method approximates using 100 permutations across 400 features. Integrated Gradients (IG) (Sundararajan et al., 2017) is computed with 100 steps, and the attributions are summed for all pixels within each patch.