reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Probabilistic neural operators for functional uncertainty quantification

Authors: Christopher Bülte, Philipp Scholl, Gitta Kutyniok

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce the probabilistic neural operator (PNO), a framework for learning probability distributions over the output function space of neural operators. PNO extends neural operators with generative modeling based on strictly proper scoring rules, integrating uncertainty information directly into the training process. We provide a theoretical justification for the approach and demonstrate improved performance in quantifying uncertainty across different domains and with respect to different baselines. Furthermore, PNO requires minimal adjustment to existing architectures, shows improved performance for most probabilistic prediction tasks, and leads to well-calibrated predictive distributions and adequate uncertainty representations even for long dynamical trajectories. Implementing our approach into large-scale models for physical applications can lead to improvements in corresponding uncertainty quantification and extreme event identification, ultimately leading to a deeper understanding of the prediction of such surrogate models.
Researcher Affiliation	Academia	Christopher Bülte EMAIL Ludwigs-Maximilians-Universität München Munich Center for Machine Learning (MCML) Munich, Germany Philipp Scholl EMAIL Ludwigs-Maximilians-Universität München Munich Center for Machine Learning (MCML) Munich, Germany Gitta Kutyniok EMAIL Ludwigs-Maximilians-Universität München University of Tromsø DLR-German Aerospace Center Munich Center for Machine Learning (MCML) Munich, Germany
Pseudocode	No	The paper describes methodologies and theoretical frameworks using mathematical equations and textual descriptions, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, figures, or structured code-like procedures.
Open Source Code	Yes	All experiments were implemented in Py Torch and run on an NVIDIA RTX A6000 with 48 GB of RAM. The accompanying code can be found at https://github.com/cbuelt/pfno.
Open Datasets	Yes	We use data from Takamoto et al. (2022), with the forcing function fixed as f(x) = 1, which corresponds to the setting in Li et al. (2021). We simulate the KS-equation from random uniform noise U( 1, 1) on a periodic domain D = [0, 100] using the py-pde package (Zwicker, 2020). Similarly to Bonev et al. (2023), we simulate 5000 training samples with resolution 128 × 256 with PDE parameters that model the Earth and an additional 500 evaluation samples. We use the ERA5 dataset (Hersbach et al., 2020), provided via the benchmark dataset Weather Bench (Rasp et al., 2024).
Dataset Splits	Yes	Similarly to Bonev et al. (2023), we simulate 5000 training samples with resolution 128 × 256 with PDE parameters that model the Earth and an additional 500 evaluation samples. All experiments are evaluated on a previously unseen test set with predictions aggregated over ten training runs with different random seeds.
Hardware Specification	Yes	All experiments were implemented in Py Torch and run on an NVIDIA RTX A6000 with 48 GB of RAM.
Software Dependencies	No	All experiments were implemented in Py Torch and run on an NVIDIA RTX A6000 with 48 GB of RAM. While PyTorch is mentioned, a specific version number is not provided, making it difficult to fully reproduce the software environment.
Experiment Setup	Yes	The models are trained for a maximum of 1000 epochs and we use early stopping, to stop the training process, if the validation loss does not improve for ten epochs in order to avoid overfitting. For optimization, we employ the popular Adam optimizer (Kingma & Ba, 2017) with gradient clipping. Due to its size, we use an additional learning rate scheduler for the ERA5 dataset, which halves the learning rate if the validation loss does not improve for five epochs. The PNO methods are trained with 3 generated samples, while we use an ensemble size of M = 100 across all methods for evaluation. Table 5: Overview of relevant model hyperparameters. (Contains specific batch sizes, learning rates, hidden channels, projection channels, lifting channels, and modes for FNO, UNO, SFNO, SNO architectures).