reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods

Authors: Buelent Uendes, Shujian Yu, Mark Hoogendoorn

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on vision and time-series tasks demonstrate that Start Grad enhances the optimization process of various state-of-the-art mask-explanation methods by reaching target metrics faster and, in some cases, boosting their overall performance.
Researcher Affiliation	Academia	Buelent Uendes, Shujian Yu & Mark Hoogendoorn Vrije Universiteit Amsterdam The Netherlands {b.uendes}@vu.nl
Pseudocode	Yes	Algorithm 1 gives pseudocode for our gradient-based initialization method which we call Start Grad.6 Algorithm 1: Gradient-based Mask Initialization (Start Grad)
Open Source Code	Yes	Our code is publicly available at https://github.com/Buelent Uendes/Start Grad
Open Datasets	Yes	For quantitative evaluation, we use the conciseness-preciseness (CP) Pixel and L1 scores introduced by Kolek et al. (2023), calculated on 500 random samples from the Image Net validation dataset (Deng et al., 2009). Following Tonekaboni et al. (2020), Crabb e & Van Der Schaar (2021) and Enguehard (2023), we test Start Grad on two commonly used synthetic benchmark datasets, i.e. state and switch-feature data both of which use a hidden Markov model (HMM) to generate the data.
Dataset Splits	Yes	calculated on 500 random samples from the Image Net validation dataset (Deng et al., 2009). For both synthetic datasets, we use the same experimental setup as in previous studies (Crabb e & Van Der Schaar, 2021; Enguehard, 2023; Liu et al., 2024), i.e. we generate 1,000 samples and train the classifier on 800 training examples, and evaluate the performance on the remaining 200 samples while reporting results across five folds.
Hardware Specification	Yes	On a Apple Macbook Pro with a M1 chip, we ran the Start Grad algorithm for all three vision models, i.e. Pixel Mask (Fong & Vedaldi, 2017), Wavelet X (Kolek et al., 2023) and Shearlet X (Kolek et al., 2023) across 100 randomly selected Image Net validation samples (Deng et al., 2009) and obtain the following average runtime (in seconds):
Software Dependencies	No	The paper mentions software like "Adam optimizer" but does not provide specific version numbers for any libraries or frameworks used (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup	Yes	For all vision experiments, we optimize the mask coefficients for 300 steps, using an Adam optimizer (Kingma & Ba, 2014) with learning rate of 10 1. In line with Kolek et al. (2023), we set for Shearlet X λ1 = 1 and λ2 = 2. For Wavelet X we set λ1 = 1 and λ2 = 10. For all the experiments, we fit a single layer one-directional GRU (Cho et al., 2014) as a baseline classification model with hidden dimension 200 and train it for 50 epochs, batch size 128, learning rate of 1e 4 with the Adam (Kingma & Ba, 2014) optimizer.