Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

Authors: Xingzhuo Guo, Yu Zhang, Baixu Chen, Haoran Xu, Jianmin Wang, Mingsheng Long

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across scientific spatiotemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks, filling a crucial gap in existing methodologies.
Researcher Affiliation Academia Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu, Jianmin Wang, Mingsheng Long School of Software, BNRist, Tsinghua University, Beijing 100084, China EMAIL EMAIL
Pseudocode Yes Algorithm 1 Training of Dynamical Diffusion Algorithm 2 Inference of Dynamical Diffusion Algorithm 3 Inverse Dynamics
Open Source Code Yes Code is available at this repository: https://github.com/thuml/dynamical-diffusion.
Open Datasets Yes We begin by evaluating the models performance in scientific spatiotemporal forecasting using the Turbulence Flow dataset (Wang et al., 2020) and the SEVIR dataset (Veillette et al., 2020). ...We further evaluate the model on six multivariate time series datasets: Exchange, Solar, Electricity, Traffic, Taxi, and Wikipedia. These datasets encompass time series with varying dimensionalities, domains, and sampling frequencies. ...All datasets are available through Gluon TS (Alexandrov et al., 2019), with detailed information shown in Table 5.
Dataset Splits Yes Turbulence Flow is a simulated dataset governed by partial differential equations (PDEs)... using 4 input frames to predict the subsequent 11 frames. SEVIR is a large-scale dataset... For this dataset, 7 input frames are used to predict the next 6 frames, with each frame having a resolution of 128 128 grids. The BAIR robot pushing dataset consists of 43k training videos and 256 test videos. ...The goal is to predict 15 future frames based on a single initial frame. The Robo Net dataset consists of 162k videos... we use 256 videos for testing and predict 10 future frames based on 2 input frames.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using the framework of Stable Video Diffusion (Blattmann et al., 2023a) and Gluon TS (Alexandrov et al., 2019), but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For the benchmark datasets, including BAIR, Robo Net, Turbulence, and SEVIR, we utilize the state-of-the-art architecture of Stable Video Diffusion (Blattmann et al., 2023a). ...Table 4 presents the detailed hyperparameters on these datasets. Table 4: Hyperparameters of Dy Diff training. Low-resolution (64 64) High-resolution (128 128) Dy Diff BAIR Robo Net Turbulence SEVIR Input channel 3 3 2 1 Prediction length 15 10 11 6 Observation length 1 2 4 7 Training steps 5 105 5 105 3 105 4 105 VAE channels [128, 256, 512] VAE downsampling ratio 4 4 VAE kl weighting 1 10 6 Latent channel 3 SVD channels [64, 128, 256, 256] Batch size 16 Learning rate 1 10 4 LR Schedule Constant Optimizer Adam