Unlearning-based Neural Interpretations

Authors: Ching Lam Choi, Alexandre Duplessis, Serge Belongie

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify this local smoothing effect by measuring the normal curvature of the model function before and after unlearning; we also demonstrate that unlearning makes attributions resistant to perturbative attacks. Our contributions can be summarised as follows: ... We empirically show that present reliance on static baselines imposes undesirable post-hoc biases... We visually, numerically and formally establish the utility of UNI as a means to compute robust, meaningful and debiased image attributions. ... We experiment on Image Net-1K (Deng et al., 2009), Image Net-C (Hendrycks & Dietterich, 2019) and compare against various path-based and gradient-based attribution methods. ... We report Mu Fidelity scores (Bhatt et al., 2021)... We evaluate with a step size of 10% and average over 10,000 random image samples... We report robustness results using 2 distance measures Spearman correlation coefficient in Table 5 and top-k pixel intersection score in Table 6 pre and post attack.
Researcher Affiliation Academia Ching Lam Choi CSAIL, Department of EECS Massachusetts Institute of Technology EMAIL Alexandre Duplessis Department of Computer Science University of Oxford EMAIL Serge Belongie Pioneer Centre for AI University of Copenhagen EMAIL
Pseudocode Yes Algorithm 1 UNI: unlearning direction, baseline matching and path-attribution
Open Source Code No The paper does not explicitly state that source code for its methodology is released or provide a link to a repository. It refers to 'open source exemplars (Fel et al., 2022a)' but this refers to related work, not their own implementation.
Open Datasets Yes We experiment on Image Net-1K (Deng et al., 2009), Image Net-C (Hendrycks & Dietterich, 2019)
Dataset Splits No The paper mentions evaluating with '10,000 random image samples' and describes how pixels are removed or inserted for inference, which is an evaluation sampling strategy. It references ImageNet-1K and ImageNet-C, which are standard datasets, but does not explicitly state which train/test/validation splits were used for the experiments performed in this paper, nor does it cite a specific split methodology used for its own evaluation.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'pre-trained computer vision backbone models (Paszke et al., 2019)', where Paszke et al., 2019 refers to PyTorch, but it does not specify any version numbers for PyTorch or other software dependencies used in their implementation.
Experiment Setup Yes Unless otherwise specified, we the following hyperparameters: unlearning step size η = 1; l2 PGD with T = 10 steps, a budget of ε = 0.25, step size µ = 0.1; Riemann approximation with B = 15 steps.