reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Physics-informed Temporal Alignment for Auto-regressive PDE Foundation Models

Authors: Congcong Zhu, Xiaoyan Xu, Jiayue Han, Jingrun Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that PITA significantly enhances the accuracy and robustness of existing foundation models on diverse time-dependent PDE data. The code is available at https://github.com/SCAILab-USTC/PITA. 4. Experiments Datasets: To enable comparisons with baselines employing the auto-regressive strategy, we select 12 datasets from four different sources: 3 datasets from FNO (Li et al., 2020), 6 datasets from PDEBench (Takamoto et al., 2022), 2 datasets from PDEArena (Gupta & Brandstetter, 2022), and 1 dataset from CFDBench (Luo et al., 2023). For generalization tasks, the Burgers equation from (Boussif et al., 2022) is included as an additional dataset. Additional details regarding the datasets used can be found in the Appendix C. Baselines: We compare PITA with auto-regressive baselines, primarily focusing on PDE foundation models such as DPOT (Hao et al., 2024) and MPP (Mc Cabe et al., 2023). Training and Evaluation: All experiments are carried out on a single A800 GPU with 80 GB of memory. We apply the commonly used scale-independent normalized root mean squared error (n RMSE) (Takamoto et al., 2023; Hao et al., 2023) to measure the quality of the prediction, which is defined as follows, 4.1. State-of-the-Art Results The in-distribution performance of PITA is evaluated and presented in Table 1. 4.2. Solution to Error Accumulations Following the error visualization methodology introduced in (Christlieb et al., 2016), we compute and plot the rollingstep mean squared error (MSE) for each long-term forecasting dataset, as illustrated in Figure 10. 4.3. Generalizing to Downstream tasks To investigate the generalization performance of PITA on downstream tasks, we consider two experimental settings: 4.4. Ablation Studies We performed five ablation studies to assess the impact of different design choices in PITA, training small-scale models on the FNO-NS-1E-3 dataset for Tasks 1 4 and on the Burgers equation for Task 5.
Researcher Affiliation	Academia	1School of Artificial Intelligence and Data Science, University of Science and Technology of China 2Suzhou Institute for Advanced Research, University of Science and Technology of China 3Suzhou Big Data & AI Research and Engineering Center 4Department of Radiation Oncology, University of Kansas Medical Center 5School of Mathematical Sciences, University of Science and Technology of China. Correspondence to: Jingrun Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Sparse regression for Equation (4) and (5). Input: Time derivative vector t Ui(θ), candidate function library matrix Φ(θ), threshold tolerance β, maximum iteration number K. Initialize: λi = Φ (θ) t Ui(θ), i = 1, , I, k = 0. for k K do Determine two groups of indices of coefficients in λi: P = {p P : \|λp i \| < β} , Q = {q Q : \|λq i \| β}. Imp
Open Source Code	Yes	The code is available at https://github.com/SCAILab-USTC/PITA.
Open Datasets	Yes	Datasets: To enable comparisons with baselines employing the auto-regressive strategy, we select 12 datasets from four different sources: 3 datasets from FNO (Li et al., 2020), 6 datasets from PDEBench (Takamoto et al., 2022), 2 datasets from PDEArena (Gupta & Brandstetter, 2022), and 1 dataset from CFDBench (Luo et al., 2023). For generalization tasks, the Burgers equation from (Boussif et al., 2022) is included as an additional dataset.
Dataset Splits	Yes	The testing datasets comprise PDE data that share the same boundary conditions and parameters as the training data but differ in their initial conditions. Overall, PITA demonstrates state-of-the-art (SOTA) performance in both short trajectory predictions (10 steps) and long trajectory predictions (longer than 10 steps). For FNO-NS-1e-5, the total length of the testing data is 20 steps, while for both FNO-NS-1e-4 and FNO-NS-1e-3, the length is 30 steps. The task involves predicting future vorticity steps w(x, t) given the initial 10 steps The total length of the testing dataset comprises 101 steps. Given the initial 10 steps, the objective is to predict the water depth h(x, t) within the domain [ 2.5, 2.5]2 [0, 1]. The total length of the testing dataset consists of 20 steps. The task is to predict the velocity field u(x, t) given the initial 10 time steps.
Hardware Specification	Yes	All experiments are carried out on a single A800 GPU with 80 GB of memory.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	In our experiments, the roll-in window length is set to Tin = 10, and the roll-out window length is set to Tar = 1 or Tar = 10. However, directly applying this strategy may lead to the shortcut problem, causing the accumulation of errors propagated across the time window. Our framework employs a physics-informed temporal alignment strategy to address this issue. Table 8: Training Hyperparameter Settings Across Models and Strategies. Model DPOT MPP FNO Strategy Auto-regressive PITA Auto-regressive PITA Auto-regressive PITA Batch Size 20 20 24 24 20 20 Gradient Clipping 10000 10000 1 1 1 1 Dropout 0 0 0.1 0.1 0 0 Initial Learning Rate 1e-3 1e-3 1e-3 1e-3 1e-3 1e-3 Optimizer Adam Adam Adam W Adam W Adam Adam Learning Rate Schedule Cycle Cycle Cycle Cycle Cycle Cycle Weight Decay 1e-6 1e-6 5e-2 5e-2 1e-6 1e-6 Warmup Epoch 50 50 5 5 50 50 optimizer momentum (0.9,0.9) (0.9,0.9) (0.9,0.999) (0.9, 0.999) (0.9,0.9) (0.9,0.9) The pretrained model is finetuned on this dataset for 500 epochs employing both auto-regressive training and PITA. The corresponding results, presented in the third column of Table 2, demonstrate that PITA consistently outperforms across all model sizes, achieving an average error reduction of 37.8% compared to the original auto-regressive approach. The tested manual settings for (α1, α2) are (0.5, 0.5), (0.3, 0.7), and (0.7, 0.3).