Rethinking Aleatoric and Epistemic Uncertainty
Authors: Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope, Mark Van Der Wilk, Adam Foster, Tom Rainforth
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 4 we demonstrate that the approximation EIGθ IGz(y + 1: ) from Proposition 5 can be coarse. These results were produced with extremely simple setups within which we can perform exact inference and we are sure to recover the true data-generating process in the limit of infinite data (see Appendix C for details). We therefore know that the estimation error is due to a failure of the model to accurately simulate future data, which in turn is due to n being finite. Figure 5 BALD outperforms predictive entropy as a data-acquisition objective in active learning, even though BALD tends to be a worse estimator of long-run predictive information gain in the setups studied. These results were produced using experimental setups described in Bickford Smith et al (2023; 2024). Appendix C Implementation details Code to generate Figures 3 and 4 is available at github.com/fbickfordsmith/rethinking-aleatoric-epistemic. |
| Researcher Affiliation | Academia | 1University of Oxford. Correspondence to Freddie Bickford Smith <EMAIL>. |
| Pseudocode | No | The paper describes methods and concepts using mathematical notation and prose but does not include any clearly labeled pseudocode or algorithm blocks in a structured format. |
| Open Source Code | Yes | Code to generate Figures 3 and 4 is available at github.com/fbickfordsmith/rethinking-aleatoric-epistemic. |
| Open Datasets | Yes | Figure 5 BALD outperforms predictive entropy as a data-acquisition objective in active learning... These results were produced using experimental setups described in Bickford Smith et al (2023; 2024) using Curated MNIST and Coarse Image Net. |
| Dataset Splits | No | The paper mentions datasets like Curated MNIST and Coarse Image Net and describes sampling data for synthetic cases (e.g., 'sample four datasets, y1:n, with yi ptrain(y) and n (1, 10, 100, 1000)'). However, it does not explicitly provide specific train/test/validation splits (percentages or counts) or reference standard splits within the main text for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments. |
| Experiment Setup | Yes | C.1 Figure 3 We consider predicting an output, z R, corresponding to an input, x R... We use this to compute a Gaussian-process predictive posterior, pn(z|x) = p(z|x, y1:n), based on a generative model comprising a Gaussian likelihood function, p(z|x, θ) = Normal(z|θ(x), σ2), where σ = 0.1, and a Gaussian-process prior, θ GP(0, k), where k(x, x ) = exp( (x x )2/2). C.2 Figure 4 In the discrete case we have y {0, 1}, data generated from ptrain(y) = Bernoulli(y|η = 0.5)... In the continuous case we have y R, data generated from ptrain(y) = Normal(y|µ = 1, σ2 = 1)... |