CViT: Continuous Vision Transformer for Operator Learning
Authors: Sifan Wang, Jacob Seidman, Shyam Sankaran, Hanwen Wang, George Pappas, Paris Perdikaris
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate CVi T s effectiveness across a diverse range of partial differential equation (PDE) systems, including fluid dynamics, climate modeling, and reaction-diffusion processes. Our comprehensive experiments show that CVi T achieves state-of-the-art performance on multiple benchmarks, often surpassing larger foundation models, even without extensive pretraining and roll-out fine-tuning. |
| Researcher Affiliation | Collaboration | Sifan Wang1, Jacob H. Seidman3,4,5, Shyam Sankaran3, Hanwen Wang2, George J. Pappas4 Paris Perdikaris3 1Institution for Foundation of Data Science, Yale University 2Graduate Program in Applied Mathematics and Computational Science, University of Pennsylvania 3Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania 4Department of Electrical and Systems Engineering, University of Pennsylvania 5Reality Defender EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes the architecture and components of CVi T using text, equations, and diagrams (Figure 1), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All data and code are publicly available at https://github.com/Predictive Intelligence Lab/cvit. |
| Open Datasets | Yes | All data and code are publicly available at https://github.com/Predictive Intelligence Lab/cvit. We make use of the datasets and problem setup established by Hoop et al. (de Hoop et al., 2022). The dataset is generated by PDEArena (Gupta & Brandstetter, 2022) using Speedy Weather.jl... The 2D Navier-Stokes data is generated by PDEArena (Gupta & Brandstetter, 2022)... We use the dataset generated by PDEBench (Takamoto et al., 2022)... |
| Dataset Splits | Yes | For the training and evaluation, we considered a split of 20,000 samples used for training, 10,000 for validation, and 10,000 for testing. All models are trained with 5,600 trajectories and evaluated on the remain 1,000 trajectories. The models are trained with 6,500 trajectories and tested on the remaining 1,300 trajectories. The models are trained with 9,000 trajectories and tested on the remaining 1,000 trajectories. The models are trained with 900 trajectories and tested on the remaining 100 trajectories. |
| Hardware Specification | Yes | All experiments were performed on a single Nvidia RTX A6000 GPU. |
| Software Dependencies | No | We also thank the developers of the software that enabled our research, including JAX Bradbury et al. (2018), Matplotlib Hunter (2007), and Num Py Harris et al. (2020). The paper mentions software tools used but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We employ Adam W optimizer (Kingma & Ba, 2014; Loshchilov & Hutter, 2017) with a weight decay 10 5. Our learning rate schedule includes an initial linear warm-up phase of 5,000 steps, starting from zero and gradually increasing to 10 3, followed by an exponential decay at a rate of 0.9 for every 5,000 steps. The loss function is a one-step mean squared error (MSE)... All models are trained for 2 105 iterations with a batch size B = 64. Within each batch, we randomly sample Q = 1,024 query coordinates from the grid and corresponding output labels. ...we use a patch size of 8 8 for tokenizing inputs. We also employ a decoder with a single cross-attention Transformer block for all configurations. The grid resolution is set to the spatial resolution of each dataset. The latent dimension of grid features is set to 512... we use β = 105 to ensure sufficient locality of the interpolated features. |