reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Universal Functional Regression with Neural Operator Flows

Authors: Yaozhong Shi, Angela F Gao, Zachary E Ross, Kamyar Azizzadenesheli

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically study the performance of Op Flow on regression and generation tasks with data generated from Gaussian processes with known posterior forms and non-Gaussian processes, as well as real-world earthquake seismograms with an unknown closed-form distribution. In this section, we aim to provide a comprehensive evaluation of Op Flow’s capabilities for UFR, as well as generation tasks. We consider several ground truth datasets, composed of both Gaussian and non-Gaussian processes, as well as a real-world dataset of earthquake recordings with highly non-Gaussian characteristics.
Researcher Affiliation	Collaboration	Yaozhong Shi EMAIL California Institute of Technology Angela F. Gao EMAIL California Institute of Technology Zachary E. Ross EMAIL California Institute of Technology Kamyar Azizzadenesheli EMAIL NVIDIA Corporation
Pseudocode	Yes	Algorithm 1 Op Flow Training ... Algorithm 2 Sample from posterior using SGLD ... Algorithm 3 Forward and Inverse processes of Op Flow
Open Source Code	Yes	3 https://github.com/yzshi5/Op Flow
Open Datasets	Yes	real-world earthquake seismograms ... Japan Kiban Kyoshin network (Ki K-net). The raw data are provided by the Japanese agency, National Research Institute for Earth Science and Disaster Prevention (NIED, 2013). The Ki K-net data encompasses 20,643 time series of ground velocity for earthquakes with magnitudes in the range 4.5 to 5, recorded between 1997 and 2017.
Dataset Splits	Yes	We build a training dataset for U GP, taking l U = 0.5, νU = 1.5 and the training dataset contains 30000 realizations. For the training dataset, we use 30000 realizations of the TGP. We generate 20000 samples in total for training. After training the Op Flow prior, we select a new (unseen) time series randomly for testing (Fig. 6a), and perform UFR over the entire domain with 60 randomly selected observation points.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The implementation of GANO used herein follows that of the paper (Rahman et al., 2022a). In the following experiments, the implementations of the GP for the Gaussian space A of GANO is adapted to optimize GANO’s performance for each specific case. Unless otherwise specified, the default GP implementation utilize the Gaussian process package available in Scikit-learn (Pedregosa et al., 2011).
Experiment Setup	Yes	Table 2: Datasets and regression parameters. Datasets total iterations (N) burn-in iterations (b) sample iterations (t N) temperature of noise (T) initial learning rate (ηt, t = 0) end learning rate (ηt, t = N) GP 4e4 2e3 10 1 5e-3 4e-3 TGP 4e4 2e3 10 1 5e-3 4e-3 GRF 2e4 2e3 10 1 5e-3 4e-3 Seismic 4e4 2e3 10 1 1e-3 8e-4 Codomain GP 4e4 2e3 10 1 5e-5 4e-5