Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Authors: Xingzhuo Guo, Yu Zhang, Baixu Chen, Haoran Xu, Jianmin Wang, Mingsheng Long
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across scientific spatiotemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks, filling a crucial gap in existing methodologies. |
| Researcher Affiliation | Academia | Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu, Jianmin Wang, Mingsheng Long School of Software, BNRist, Tsinghua University, Beijing 100084, China EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Training of Dynamical Diffusion Algorithm 2 Inference of Dynamical Diffusion Algorithm 3 Inverse Dynamics |
| Open Source Code | Yes | Code is available at this repository: https://github.com/thuml/dynamical-diffusion. |
| Open Datasets | Yes | We begin by evaluating the models performance in scientific spatiotemporal forecasting using the Turbulence Flow dataset (Wang et al., 2020) and the SEVIR dataset (Veillette et al., 2020). ...We further evaluate the model on six multivariate time series datasets: Exchange, Solar, Electricity, Traffic, Taxi, and Wikipedia. These datasets encompass time series with varying dimensionalities, domains, and sampling frequencies. ...All datasets are available through Gluon TS (Alexandrov et al., 2019), with detailed information shown in Table 5. |
| Dataset Splits | Yes | Turbulence Flow is a simulated dataset governed by partial differential equations (PDEs)... using 4 input frames to predict the subsequent 11 frames. SEVIR is a large-scale dataset... For this dataset, 7 input frames are used to predict the next 6 frames, with each frame having a resolution of 128 128 grids. The BAIR robot pushing dataset consists of 43k training videos and 256 test videos. ...The goal is to predict 15 future frames based on a single initial frame. The Robo Net dataset consists of 162k videos... we use 256 videos for testing and predict 10 future frames based on 2 input frames. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the framework of Stable Video Diffusion (Blattmann et al., 2023a) and Gluon TS (Alexandrov et al., 2019), but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the benchmark datasets, including BAIR, Robo Net, Turbulence, and SEVIR, we utilize the state-of-the-art architecture of Stable Video Diffusion (Blattmann et al., 2023a). ...Table 4 presents the detailed hyperparameters on these datasets. Table 4: Hyperparameters of Dy Diff training. Low-resolution (64 64) High-resolution (128 128) Dy Diff BAIR Robo Net Turbulence SEVIR Input channel 3 3 2 1 Prediction length 15 10 11 6 Observation length 1 2 4 7 Training steps 5 105 5 105 3 105 4 105 VAE channels [128, 256, 512] VAE downsampling ratio 4 4 VAE kl weighting 1 10 6 Latent channel 3 SVD channels [64, 128, 256, 256] Batch size 16 Learning rate 1 10 4 LR Schedule Constant Optimizer Adam |