Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Authors: Anthony Zhou, Zijie Li, Michael Schneier, John Buchanan, Amir Barati Farimani
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on both uniform and structured grids, we show that the proposed approach is competitive with current neural PDE solvers in both accuracy and efficiency, with promising scaling behavior up to 3 billion parameters. By introducing a scalable, accurate, and usable physics simulator, we hope to bring neural PDE solvers closer to practical use. [...] 4 EXPERIMENTS We explore the proposed latent diffusion model (LDM) for three PDE datasets: 2D flows around a cylinder (Pfaff et al., 2021), 2D buoyancy-driven smoke flows (Gupta & Brandstetter, 2022), and 3D turbulence around geometric objects (Lienen et al., 2024). [...] For each model, we report its parameter count and relative L2 loss. [...] |
| Researcher Affiliation | Collaboration | Anthony Zhou, Zijie Li & Amir Barati Farimani Carnegie Mellon University EMAIL Michael Schneier & John R. Buchanan, Jr. Naval Nuclear Laboratory EMAIL |
| Pseudocode | No | The paper describes methods and algorithms in prose and uses diagrams (e.g., Figure 2 for architecture overview), but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To further this objective, all code, datasets, and pretrained models are released at: https://github.com/anthonyzhou-1/ldm_pdes |
| Open Datasets | Yes | To further this objective, all code, datasets, and pretrained models are released at: https://github.com/anthonyzhou-1/ldm_pdes 4 EXPERIMENTS We explore the proposed latent diffusion model (LDM) for three PDE datasets: 2D flows around a cylinder (Pfaff et al., 2021), 2D buoyancy-driven smoke flows (Gupta & Brandstetter, 2022), and 3D turbulence around geometric objects (Lienen et al., 2024). |
| Dataset Splits | Yes | 4.1 CYLINDER FLOW Dataset Following Pfaff et al. (2021), we use 1000 samples for training and 100 samples for validation. 4.2 BUOYANCY-DRIVEN FLOW Dataset We use 2496 training samples and 608 validation samples, with varying initial conditions and buoyancy factors, and at a spatial resolution of 128 128 and 48 timesteps, which is compressed to a latent dimension of 6 16 16. 4.3 3D TURBULENCE Dataset We use 36 training samples and 9 validation samples, downsampled to a resolution of 96 24 24 and 1000 timesteps; however, we set the prediction horizon to 48 timesteps during evaluation. |
| Hardware Specification | Yes | F.1 CYLINDER FLOW All baselines were trained on a single NVIDIA RTX 6000 Ada GPU with a learning rate of 10 5 until convergence. LDM models were trained on a single NVIDIA A100 40GB GPU. F.2 SMOKE BUOYANCY All baselines were trained on a single NVIDIA RTX 6000 Ada GPU with a learning rate of 10 4 until convergence. LDM models were trained on four NVIDIA A100 40GB GPUs. For large models, four NVIDIA A100 80GB GPUs were used to handle memory requirements. F.3 3D TURBULENCE All baselines were trained on a single NVIDIA A100 GPU with a learning rate of 10 5 until convergence. LDM models were trained on four NVIDIA A100 80GB GPUs were used to handle memory requirements. G INFERENCE TIME COMPARISON ...evaluated on a single NVIDIA RTX 6000 Ada GPU or NVIDIA A100 GPU (3D Turbulence). ...Phi Flow was solved faster on our CPU (AMD Ryzen Threadripper PRO 5975WX 32-Cores) than on our GPU (NVIDIA RTX 6000 Ada), so numerical solver times are reported using the CPU. |
| Software Dependencies | No | The paper mentions software like Deep Speed (Rasley et al., 2020), Phi Flow (Holl & Thuerey, 2024), COMSOL, and Open FOAM, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | F TRAINING DETAILS To maintain a consistent setup in all experiments, the only ground truth information provided to each model is from the first frame of the simulation. Additionally, to maintain a fair FLOPs comparison, the compute required for autoregressive models during training is multiplied by the number of timesteps. F.1 CYLINDER FLOW All baselines were trained on a single NVIDIA RTX 6000 Ada GPU with a learning rate of 10 5 until convergence. LDM We train an autoencoder with a latent grid size of 64, GNO radius of 0.0425, hidden dimension of 64, and 3 downsampling layers, resulting in a latent size of 16 16 16, which is a compression ratio of around 48. Additionally, the Di T backbone uses 1000 denoising steps, a patch size of 2, a hidden size of (512/1024), and a depth of (24/28) depending on model size. |