Rethinking Aleatoric and Epistemic Uncertainty

Authors: Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope, Mark Van Der Wilk, Adam Foster, Tom Rainforth

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 4 we demonstrate that the approximation EIGθ IGz(y + 1: ) from Proposition 5 can be coarse. These results were produced with extremely simple setups within which we can perform exact inference and we are sure to recover the true data-generating process in the limit of infinite data (see Appendix C for details). We therefore know that the estimation error is due to a failure of the model to accurately simulate future data, which in turn is due to n being finite. Figure 5 BALD outperforms predictive entropy as a data-acquisition objective in active learning, even though BALD tends to be a worse estimator of long-run predictive information gain in the setups studied. These results were produced using experimental setups described in Bickford Smith et al (2023; 2024). Appendix C Implementation details Code to generate Figures 3 and 4 is available at github.com/fbickfordsmith/rethinking-aleatoric-epistemic.
Researcher Affiliation Academia 1University of Oxford. Correspondence to Freddie Bickford Smith <EMAIL>.
Pseudocode No The paper describes methods and concepts using mathematical notation and prose but does not include any clearly labeled pseudocode or algorithm blocks in a structured format.
Open Source Code Yes Code to generate Figures 3 and 4 is available at github.com/fbickfordsmith/rethinking-aleatoric-epistemic.
Open Datasets Yes Figure 5 BALD outperforms predictive entropy as a data-acquisition objective in active learning... These results were produced using experimental setups described in Bickford Smith et al (2023; 2024) using Curated MNIST and Coarse Image Net.
Dataset Splits No The paper mentions datasets like Curated MNIST and Coarse Image Net and describes sampling data for synthetic cases (e.g., 'sample four datasets, y1:n, with yi ptrain(y) and n (1, 10, 100, 1000)'). However, it does not explicitly provide specific train/test/validation splits (percentages or counts) or reference standard splits within the main text for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes C.1 Figure 3 We consider predicting an output, z R, corresponding to an input, x R... We use this to compute a Gaussian-process predictive posterior, pn(z|x) = p(z|x, y1:n), based on a generative model comprising a Gaussian likelihood function, p(z|x, θ) = Normal(z|θ(x), σ2), where σ = 0.1, and a Gaussian-process prior, θ GP(0, k), where k(x, x ) = exp( (x x )2/2). C.2 Figure 4 In the discrete case we have y {0, 1}, data generated from ptrain(y) = Bernoulli(y|η = 0.5)... In the continuous case we have y R, data generated from ptrain(y) = Normal(y|µ = 1, σ2 = 1)...