reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Multi-spatiotemporal-scale Generalized PDE Modeling

Authors: Jayesh K Gupta, Johannes Brandstetter

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we make such comprehensive comparisons regarding performance, runtime complexity, memory requirements, and generalization capabilities. Concretely, we stress-test various FNO, (Dilated) Res Net, and U-Net like approaches to fluid mechanics problems in both vorticity-stream and velocity function form. Figure 1: Example rollout trajectories of the best-performing U-Net model... 4 Experiments We establish the following set of desiderata for our benchmarks... Table 1: Comparison of parameter count, runtime, and memory requirement of the tested architectures... Figure 4: One-step errors for modeling different PDEs, shown for different number of training trajectories.
Researcher Affiliation	Industry	Jayesh K. Gupta EMAIL Microsoft Autonomous Systems and Robotics Research Johannes Brandstetter EMAIL Microsoft Research AI4Science
Pseudocode	No	The paper describes methods and procedures in paragraph text and refers to existing architectures, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code for our PyTorch benchmark framework is available at https://github.com/microsoft/pdearena.
Open Datasets	No	We modified the implementation in Speedy Weather.jl (Klöwer et al., 2022), obtaining data on a grid with spatial resolution of 192 × 96 (∆x = 1.875◦, ∆y = 3.75◦), and temporal resolution of ∆t = 48 h. We obtained data on a grid with spatial resolution of 128 × 128 (∆x = 0.25, ∆y = 0.25), and temporal resolution of ∆t = 1.5 s using ΦFlow (Holl et al., 2020). The paper describes the sources and methods used to generate the data (Speedy Weather.jl, ΦFlow) but does not provide direct access information (e.g., specific links, DOIs, or repository names) for the datasets themselves that were used in the experiments.
Dataset Splits	Yes	Results are averaged over 208 different unseen evaluation buoyancy force values between 0.2 and 0.5. For training, we used a dataset with higher temporal resolution of ∆t = 0.375 s and get equal number of trajectories from uniformly sampling 832 different external buoyancy force values, f = (0, f)T in Equation 5, in the range 0.2 ≤ f ≤ 0.5, using input fields at one timestep.
Hardware Specification	Yes	All experiments used 4 × 16 GB NVIDIA V100 machines for training. We warmup the benchmark for 10 iterations and report average runtimes over 100 runs on a single 16 GB NVIDIA V100 machine with input batch size of 8.
Software Dependencies	No	Source code for our PyTorch benchmark framework is available at https://github.com/microsoft/pdearena. We optimized models using the AdamW optimizer (Kingma & Ba, 2014; Loshchilov & Hutter, 2019). We used cosine annealing as learning rate scheduler (Loshchilov & Hutter, 2016). The paper mentions software such as PyTorch, AdamW, and cosine annealing but does not specify their version numbers.
Experiment Setup	Yes	We optimized models using the AdamW optimizer (Kingma & Ba, 2014; Loshchilov & Hutter, 2019) for 50 epochs and minimized the summed mean squared error. We used cosine annealing as learning rate scheduler (Loshchilov & Hutter, 2016) with a linear warmup. For FNO models, we optimized number of layers, number of channels, and number of Fourier modes. For U-Net like architectures, especially for U-Netatt, we specifically needed to optimize the maximum learning rate to be lower (10⁻⁴). We used an effective batch size of 32 for training. We used the best learning rates of [10⁻⁴, 2 × 10⁻⁴] and weight decay of 10⁻⁵.