reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Bayes estimators for censored inference with peaks-over-threshold models

Authors: Jordan Richards, Matthew Sainsbury-Dale, Andrew Zammit-Mangion, Raphaël Huser

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our simulation studies highlight signiﬁcant gains in both computational and statistical eﬃciency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, r-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the eﬃcacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.
Researcher Affiliation	Academia	Jordan Richards EMAIL School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, Edinburgh, United Kingdom. Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia. Matthew Sainsbury-Dale EMAIL Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia. School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia. Andrew Zammit-Mangion EMAIL School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia. Rapha el Huser EMAIL Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
Pseudocode	No	The paper describes its methodology in prose and includes architectural tables for neural networks, but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our methodology has been incorporated into the user-friendly open-source software package Neural Estimators (Sainsbury-Dale, 2024), which is available in Julia and R. The remainder of this paper is organised as follows. In Section 2, we review NBEs. In Section 2.1, we review NBEs. In Section 2.2, we detail our methodology for handling censored inputs. The paper concludes with a discussion in Section 6. Reproducible code is available from https://github.com/Jbrich95/Censored Neural Estimators.
Open Datasets	Yes	Data are taken from version 5 (V5.GL.02) of the global ground level product developed by Van Donkelaar et al. (2021), which consists of monthly mean surface PM2.5 concentrations (µg/m3) from 1998 to 2020 (m = 276); the product is constructed by combining satellite data from a number of sources with ground level observations, through a geographically-weighted regression, and samples over land only.
Dataset Splits	No	The paper discusses training and validating Neural Bayes Estimators (NBEs) using parameter sets, not explicit training/test/validation splits of the observational data itself. For example: "NBEs are trained and validated with parameter sets of length K and K/5, respectively, where K diﬀers between studies."
Hardware Specification	No	NBEs are trained on GPUs randomly selected from within KAUST s Ibex cluster, whilst likelihood-based estimators are evaluated on CPUs from KAUST s Shaheen II supercomputer, see https://docs.hpc.kaust.edu.sa/ systems/ibex and https://www.hpc.kaust.edu.sa/content/shaheen-ii-0 for details (last accessed 19/12/2024). The paper mentions the clusters used but does not specify the exact GPU/CPU models or other hardware details within the text.
Software Dependencies	Yes	Our methodology has been incorporated into the user-friendly open-source software package Neural Estimators (Sainsbury-Dale, 2024), which is available in Julia and R.
Experiment Setup	Yes	We use m = 200 independent replicates throughout, and the pre-training scheme described in Section 2.1, with m = (10, 50, 100, 200) ; L( , ) in (2) is taken to be the absolute-error loss. NBEs are trained and validated with parameter sets of length K and K/5, respectively, where K diﬀers between studies. We assume that the parameters are a priori independent with marginal prior distributions λ Unif(2, 10), κ Unif(0.5, 2) and δ Unif(0, 1) across all models. We train each NBE using m = (46, 138, 276). To improve model training and reduce computational memory requirements, we adopt simulation-on-the-ﬂy (for an overview, see Chan et al., 2018; Gerber and Nychka, 2021; Sainsbury-Dale et al., 2024a). For training, we begin with an initial \|ϑtrain\| = K and \|ϑval\| = K/5 training and validation parameter sets, respectively. Before every 30th epoch, the parameters are refreshed and new values are drawn from the prior and, at the end of every ﬁfth epoch, new training data are refreshed using the current parameter sets. We use the maximum, computationally-feasible value of K, which, due to increasing computational expense, changes with the dimension G; we take K equal to 750, 000, 330, 000, 100, 000 and 38, 000 for dimension 4 or 8, 16, 24 and 32, respectively. The architecture of each NBE is also dependent on G; these are given in Appendix E. Parameter sets are a priori independent with priors λ Unif(20, 1250), κ Unif(0.1, 4), δ Unif(0, 1), α Unif(0.5, 3.5) and ω Unif( π/2, 0).