Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property
Authors: Yuya Yoshikawa, Masanari Kimura, Ryotaro Shimizu, Yuki Saito
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on two tasks of image and text domains to evaluate the effectiveness of the proposed method, referred to as C2FA, implemented with Algorithm 1 in Appendix A. Its hyperparameters are provided in Appendix B. Comparing Methods. We used five methods for comparison: LIME [Ribeiro et al., 2016], MILLI [Early et al., 2022], Bottom-Up LIME (BU-LIME), Top-Down LIME (TDLIME), and Top-Down MILLI (TD-MILLI). |
| Researcher Affiliation | Collaboration | Yuya Yoshikawa1 , Masanari Kimura2 , Ryotaro Shimizu3 and Yuki Saito3 1STAIR Lab, Chiba Institute of Technology 2School of Mathematics and Statistics, The University of Melbourne 3ZOZO Research EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Estimating consistent two-level feature attributions (C2FA) with ℓ2 regularization |
| Open Source Code | No | The text mentions "In the official SHAP library [shap (Git Hub), 2024]" which refers to a third-party tool, not the authors' own code for the proposed methodology. There is no explicit statement or link provided for the code of the method described in this paper. |
| Open Datasets | Yes | We constructed an MIL dataset from the Pascal VOC semantic segmentation dataset [Everingham et al., 2015] with the ground-truth instanceand pixel-level labels. We constructed a dataset in which the validation and test sets contain 500 and 1,000 product reviews, respectively, randomly sampled from the Amazon reviews dataset [Zhang et al., 2015], respectively. |
| Dataset Splits | Yes | The number of samples in training, validation, and test subsets is 5,000, 1,000, and 2,000, respectively, and the positive and negative samples ratio is equal. We constructed a dataset in which the validation and test sets contain 500 and 1,000 product reviews, respectively, randomly sampled from the Amazon reviews dataset [Zhang et al., 2015], respectively. |
| Hardware Specification | Yes | The experiments were conducted on a server with an Intel Xeon Gold 6148 CPU and an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, BERT, ResNet-50, and Hugging Face, but does not provide specific version numbers for these components as they were used in the experimental setup. For example, it mentions "Adam optimizer [Kingma and Ba, 2015]" without specifying the version used. |
| Experiment Setup | Yes | We trained the model using our MIL image classification dataset with Adam optimizer [Kingma and Ba, 2015] with a learning rate of 0.001, a batch size of 32, and a maximum epoch of 300. The hyperparameters of C2FA, λH, λL, and µ1, were tuned using the validation subset of each dataset within the following ranges: λH, λL {0.1, 1}, and µ2 {0.001, 0.01, 0.1}. The remaining hyperparameters were set to µ1 = 0.1 and ϵ1 = ϵ2 = 10^-4. |