BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation
Authors: Oren Barkan, Yehonatan Elisha, Jonathan Weill, Noam Koenigstein
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations across various model architectures showcase the superior performance of BEE in comparison to state-of-the-art explanation methods on a variety of objective evaluation metrics. Our experiments aim to address the following research questions (RQs): 1) Does the BEE method outperform state-of-the-art methods? 2) Does BEE finetuning improve upon pretraining? 3) Do different metrics favor different explanation maps and baselines? 4) How does the number of sampled baselines T affect BEE performance? 5) How does the performance of adaptive baseline sampling compare to nonadaptive sampling? 6) Does the learned baseline distribution obtained by BEE converge to the best-performing baseline distribution per metric? 7) Does integration on intermediate representation gradients improve upon integration on input gradients? 8) What is the contribution from context modeling in BEE? 9) Can other path-integration methods benefit from BEE? The primary manuscript addresses RQs 16 comprehensively. Specifically, RQs 1-2 are addressed in Tabs. 1 and 2, RQ 3 is addressed in Tab. 3 and Fig. 2, and RQs 4-6 are addressed in Fig. 2. Due to space limitations, experiments addressing RQs 7-9, along with additional analyses and ablation studies, are provided in the Appendix. |
| Researcher Affiliation | Academia | Oren Barkan1*, Yehonatan Elisha2*, Jonathan Weill2, Noam Koenigstein2 1The Open University, Israel 2Tel Aviv University, Israel |
| Pseudocode | No | The paper describes the steps of the BEE procedure in Section 3.2 using a numbered list (1. For each z B, draw wz from a normal distribution..., 2. Draw a baseline b..., 3. Compute the metric score..., 4. u , θ argmin u,θ log σ(yu cθ(x)) + 1 2 PK i=1 qb i(ui gb i)2. 5. gb u , θ θ . 6. qb i qb i + σ(gb cθ(x))σ( gb cθ(x))cθ(x)2 i .). However, it does not use a dedicated 'Pseudocode' or 'Algorithm' block with formal pseudocode formatting. |
| Open Source Code | Yes | Code https://github.com/yonis Git/BEE |
| Open Datasets | Yes | In accordance with previous works (Kapishnikov et al. 2019, 2021; Xu, Venugopalan, and Sundararajan 2020; Chefer, Gur, and Wolf 2021b) we use the Image Net (Deng et al. 2009) ILSVRC 2012 (IN) validation set as our test set, which contains 50,000 images from 1,000 classes. |
| Dataset Splits | Yes | we use the Image Net (Deng et al. 2009) ILSVRC 2012 (IN) validation set as our test set, which contains 50,000 images from 1,000 classes. For the pretraining phase, we used a separate training set of 5000 examples taken from the IN training set, avoiding overlap with the validation set used as a test set. |
| Hardware Specification | Yes | The experiments were conducted on an NVIDIA DGX 8x A100 Server. |
| Software Dependencies | No | The paper mentions that "Optimization in both the pretraining and finetuning phases was carried out using the Adam optimizer." but does not provide specific version numbers for Adam or any other software components. |
| Experiment Setup | Yes | Unless stated otherwise, we sampled T = 8 baselines per test instance, and n = 10 interpolation steps in the integration process (Eq. 3). The integration was employed on the last convolutional / attention layer, i.e., we set I = {L} (Eq. 4). A comparison of various settings of I, including L 1 and L 2 is presented in the Appendix. The dimension of the context representation K was set to match the output dimension of each backbone separately. Optimization in both the pretraining and finetuning phases was carried out using the Adam optimizer. For precise optimization details, please refer to the Appendix and our Git Hub repository. |