REPEAT: Improving Uncertainty Estimation in Representation Learning Explainability
Authors: Kristoffer K. Wickstrøm, Thea Brüsch, Michael C. Kampffmeyer, Robert Jenssen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluation shows that REPEAT gives certainty estimates that are more intuitive, better at detecting out-of-distribution data, and more concise. Our contributions are: 2. Extensive evaluation across numerous feature extractors and datasets and comparison with state-of-the-art baselines. Results show that REPEAT produces more intuitive uncertainty estimates that are better at detecting out-of-distribution data and has lower complexity, compared to other state-of-the-art methods. 3. Evaluation on a downstream task where uncertainty is used to detect poisoned data in the unsupervised representation learning setting (He, Zha, and Katabi 2023). |
| Researcher Affiliation | Academia | 1Department of Physics and Technology, UiT The Arctic University of Norway 2Department of Applied Mathematics and Computer Science, Technical University of Denmark 3Norwegian Computing Center, Oslo, Norway 4Pioneer Centre for AI, University of Copenhagen, Denmark *Corresponding author: EMAIL |
| Pseudocode | No | The paper describes the methodology using prose, equations (Eq. 1-6), and an overview figure (Fig. 2), but it does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | Code https://github.com/Wickstrom/REPEAT/ |
| Open Datasets | Yes | We use four widely used computer vision datasets; MS-COCO (Lin et al. 2014), Pascal-VOC (Everingham et al. 2009), Euro SAT (Helber et al. 2018), and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017). |
| Dataset Splits | No | In all experiments, we randomly sample 1000 images from the dataset used for evaluation. We found that this was enough samples to provide reliable estimates of performance while still being computationally tractable. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | For simplicity and reproducibility, we use the pretrained weights from Pytorch (Paszke et al. 2019) for supervised classification of Image Net (Deng et al. 2009). |
| Experiment Setup | Yes | REPEAT design choices: In all presented results, we generate K=10 realizations of the Bernoulli RVs and use the mean to perform the thresholding. Both of these choices are determined by quantitative evaluation that is reported in App. B. As the base stochastic R-XAI method we use RELAX (Wickstrøm et al. 2023), due to its high performance in recent works. ... Specifically, we follow Wang et al. (Wang et al. 2019), where Dropout is applied to the input (Dropout probability of 0.5). Here, we create 10 Dropout-versions of each image and calculate importance using the baseline methods. Uncertainty is computed by taking the standard deviation across all 10 importance maps. ... In all experiments, we randomly sample 1000 images from the dataset used for evaluation. ... RELAX and REPEAT experiments were repeated 3 times. |