reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Calibrated Uncertainty Quantification for Operator Learning via Conformal Prediction

Authors: Ziqi Ma, David Pitt, Kamyar Azizzadenesheli, Anima Anandkumar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results, demonstrating calibrated coverage and efficient uncertainty bands outperforming baseline methods.
Researcher Affiliation	Collaboration	Ziqi Ma EMAIL California Institute of Technology David Pitt EMAIL California Institute of Technology Kamyar Azizzadenesheli EMAIL NVIDIA Anima Anandkumar EMAIL California Institute of Technology
Pseudocode	Yes	Algorithm 1: Risk-Controlling Quantile Neural Operator
Open Source Code	Yes	Code is available at https://github.com/neuraloperator/neuraloperator/tree/main (UQNO module).
Open Datasets	Yes	Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results [...] This is a data-rich scenario with 5000 total training data and 421 421 resolution, for which we obtain the ground truth from prior work Li et al. (2021). [...] The car surface is represented as a 3D mesh of 3586 points [...] car shapes from Umetani & Bickel (2018) modified from the Shape-Net dataset Chang et al. (2015) car category.
Dataset Splits	No	For UQNO, we split the training set in half for training the base and the quantile model. [...] This is a data-rich scenario with 5000 total training data [...] This is a data-scarce setting with only 500 total training samples - While training data is mentioned, overall train/test/validation splits are not explicitly provided.
Hardware Specification	Yes	Training time is approximate GPU hours on a single RTX4090.
Software Dependencies	No	The paper discusses neural operator architectures and methods but does not provide specific software names with version numbers for reproducibility.
Experiment Setup	Yes	We fix the same Fourier Neural Operator architecture for all methods. [...] MCDropout Gal & Ghahramani (2016): which predicts uncertainty by aggregating results from multiple (we use 10) models trained with random dropout [...] Deep Ensemble Lakshminarayanan et al. (2017): which predicts uncertainty by aggregating results from an ensemble (we use 10) of models [...] For both tasks, we show a high-domain-threshold scenario (α = 0.02 for Darcy and α = 0.04 for car, α values larger for car due to its lower resolution due to the correction term t in Equation 7) and a low-domain-threshold scenario (α = 0.1 for Darcy and α = 0.12 for car). In the Darcy problem, where we have sufficient data, most methods satisfy our calibration target of 98%...