reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Getting a CLUE: A Method for Explaining Uncertainty Estimates

Authors: Javier Antoran, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty.
Researcher Affiliation	Academia	Javier Antorán University of Cambridge EMAIL Umang Bhatt University of Cambridge EMAIL Tameem Adel University of Cambridge University of Liverpool EMAIL Adrian Weller University of Cambridge The Alan Turing Institute EMAIL José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute EMAIL
Pseudocode	Yes	The CLUE algorithm and a diagram of our procedure are provided in Algorithm 1 and Figure 4, respectively.
Open Source Code	Yes	Our code is at: github.com/cambridge-mlg/CLUE.
Open Datasets	Yes	We validate CLUE on LSAT academic performance regression (Wightman et al., 1998), UCI Wine quality regression, UCI Credit classiﬁcation (Dua & Graff, 2017), a 7 feature variant of COMPAS recidivism classiﬁcation (Angwin et al.), and MNIST image classiﬁcation (Le Cun & Cortes, 2010).
Dataset Splits	Yes	For each, we select roughly the 20% most uncertain test points as those for which we reject our BNNs decisions. We only generate CLUEs for rejected points. Rejection thresholds, architectures, and hyperparameters are in Appendix B.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'RAdam optimizer (Liu et al., 2020)' but does not specify software versions for frameworks or libraries like PyTorch, TensorFlow, etc., or for the optimizer itself.
Experiment Setup	Yes	Optimization runs for a minimum of three iterations and a maximum of 35 iterations, with a learning rate of 0.1. [...] We use a ﬁxed step size of ϵ = 0.01 and batch sizes of 512. [...] We train all generative models with the RAdam optimizer (Liu et al., 2020) with a learning rate of 1e 4 for tabular data and 3e 4 for MNIST. [...] All architectural hyperparameters are provided in Table 4. [...] The rejection thresholds used for each dataset are displayed in Table 5. The same table contains the values of λx used in all experiments.