Getting a CLUE: A Method for Explaining Uncertainty Estimates
Authors: Javier Antoran, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato
ICLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty. |
| Researcher Affiliation | Academia | Javier Antorán University of Cambridge EMAIL Umang Bhatt University of Cambridge EMAIL Tameem Adel University of Cambridge University of Liverpool EMAIL Adrian Weller University of Cambridge The Alan Turing Institute EMAIL José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute EMAIL |
| Pseudocode | Yes | The CLUE algorithm and a diagram of our procedure are provided in Algorithm 1 and Figure 4, respectively. |
| Open Source Code | Yes | Our code is at: github.com/cambridge-mlg/CLUE. |
| Open Datasets | Yes | We validate CLUE on LSAT academic performance regression (Wightman et al., 1998), UCI Wine quality regression, UCI Credit classification (Dua & Graff, 2017), a 7 feature variant of COMPAS recidivism classification (Angwin et al.), and MNIST image classification (Le Cun & Cortes, 2010). |
| Dataset Splits | Yes | For each, we select roughly the 20% most uncertain test points as those for which we reject our BNNs decisions. We only generate CLUEs for rejected points. Rejection thresholds, architectures, and hyperparameters are in Appendix B. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'RAdam optimizer (Liu et al., 2020)' but does not specify software versions for frameworks or libraries like PyTorch, TensorFlow, etc., or for the optimizer itself. |
| Experiment Setup | Yes | Optimization runs for a minimum of three iterations and a maximum of 35 iterations, with a learning rate of 0.1. [...] We use a fixed step size of ϵ = 0.01 and batch sizes of 512. [...] We train all generative models with the RAdam optimizer (Liu et al., 2020) with a learning rate of 1e 4 for tabular data and 3e 4 for MNIST. [...] All architectural hyperparameters are provided in Table 4. [...] The rejection thresholds used for each dataset are displayed in Table 5. The same table contains the values of λx used in all experiments. |