reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linearization Turns Neural Operators into Function-Valued Gaussian Processes

Authors: Emilia Magnani, Marvin Pförtner, Tobias Weber, Philipp Hennig

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate linearized predictive uncertainty (LUNO) against sample-based approaches (Sample), which require additional approximations to impose a Gaussian Process structure over the output space. ... We evaluate the predictive uncertainty using standard metrics: the expected root mean squared error (RMSE) of the mean predictions, the expected marginal χ2 statistics, and the expected marginal negative log-likelihood (NLL) over 250 test input-output pairs. ... We demonstrate the capabilities of the framework in a case study on Fourier neural operators.
Researcher Affiliation	Academia	1Tübingen AI Center, University of Tübingen, Tübingen, Germany. Correspondence to: Emilia Magnani <EMAIL>.
Pseudocode	No	The paper describes the methodology and algorithms using mathematical formulations and textual descriptions, but it does not include any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	Code. We provide an efficient implementation of the LUNO framework in JAX (Bradbury et al., 2018) at / Methods Of Machine Learning / luno. The code for our experiments can be found at / 2bys / luno-experiments.
Open Datasets	Yes	To evaluate the performance of the uncertainty quantification methods discussed, we utilize the code in the APEBench for generating data from Burgers , Hyper Diffusion and Kuramoto-Sivashinsky equation (conservative) (cf. (Koehler et al., 2024) for more details). Table 3 summarizes the characteristics of the datasets we use, the number of trajectories for training, and testing, as well as the spatial and temporal resolutions.
Dataset Splits	Yes	Table 3: Summary of PDE datasets generated using APEBench. PDE Name Dimensions Training Traj. Valid. Traj. Test Traj. Spatial Res. Temp. Res. Burgers 1D 25 250 250 256 59 Hyper Diffusion 1D 25 250 250 256 59 Kuramoto-Sivashinsky (cons.) 1D 25 250 250 256 59
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models.
Software Dependencies	No	All training implementations rely on jax (Bradbury et al., 2018), Flax NNX and optax.
Experiment Setup	Yes	For all experiments, we consider the original Fourier neural operator architecture (Li et al., 2021) with the hyperparameter suggestions following (Koehler et al., 2024), i.e. 12 modes (per spatial dimension) and 18 hidden dimensions constant throughout the network, with a total of 4 Fourier blocks. ... Networks for the low data experiment are trained for 100 epochs, all remaining networks are trained for 1000 epochs where one epoch corresponds to iterating through a single input-output pair per trajectory in the training set. During training the mean squared error loss was minimized using Adam W (Loshchilov & Hutter, 2019) combined with a cosine decay learning rate scheduler with warmup.