reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

One Wave To Explain Them All: A Unifying Perspective On Feature Attribution

Authors: Gabriel Kasmi, Amandine Brunetto, Thomas Fel, Jayneel Parekh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superiority of WAM through extensive empirical evaluations across a diverse set of metrics, model architectures, and datasets, showing its robustness and versatility. We also discuss the novel insights and connections that our method permits, such as connecting feature attribution and the characterization of a model s robustness or filtering the important part of an audio signal without requiring a to train an explanation method. Table 1 presents the performance of WAM. Our method consistently outperforms traditional attribution methods, especially for images and volumes, while achieving competitive results for audio.
Researcher Affiliation	Collaboration	1Mines Paris PSL University, Paris, France 2RTE France, Paris La D efense, France 3Kempner Institute, Harvard University, Cambridge, MA, United-States 4ISIR, Sorbonne Universit e, Paris, France. Correspondence to: Gabriel Kasmi <gabriel.kasmi[at]minesparis.psl.eu>.
Pseudocode	Yes	Figure 10. Pseudo-code for adding Gaussian noise to audio.
Open Source Code	No	Project page: https: //gabrielkasmi.github.io/wam/. (The URL provided is for a project page, not a direct code repository link, and there is no explicit statement about code release for the methodology described in the paper.)
Open Datasets	Yes	To address this gap, we designed a comprehensive evaluation framework spanning diverse datasets: ESC-50 (Piczak, 2015) for audio, Image Net (Russakovsky et al., 2015) for images, and Med MNIST3D (Yang et al., 2023) for volumes.
Dataset Splits	Yes	Evaluations are conducted on 400 samples from ESC-50 (fold 1), 1,000 images from Image Net s validation set and the full Adrenal MNIST3D test set (298 samples). For volumes, we considered the LATEC benchmark (Klein et al., 2024) and evaluate our method on two datasets of Med MNIST (Yang et al., 2023): Adrenal MNIST3D and Vessel MNIST. We carry out our evaluation over the complete test set.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are mentioned in the paper for running its experiments.
Software Dependencies	No	We use the Python library Captum (Kokhlikyan et al., 2020) for consistently implementing existing methods on our datasets for audio and images. All models are retrieved from the Py Torch Image Models (Wightman, 2019) repository. (While software names are mentioned, specific version numbers are not provided for reproducibility.)
Experiment Setup	Yes	The number of samples, n, needed to compute the approximation of the smoothed gradient and the standard deviation σ2 are hyperparameters. In practice, we employ the Nadam (Dozat, 2016) optimizer, which combines the benefits of Nesterov acceleration and Adam optimization. Our approach consistently produces masks with controllable sparsity levels up to 90%, meaning that 90% of the wavelet coefficients are zeroed out, while maintaining a classification score comparable to or better than the original prediction. Results are averaged across 1,000 images optimized for 500 steps and for α ranging in [0, 100] for each image.