DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

Authors: Tianyu Huang, Haoze Zhang, Yihan Zeng, Zhilu Zhang, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method enjoys more realistic motions than state-of-the-arts do. Extensive experiments show that our results enjoy more realistic motion simulation. Extensive ablation studies are then conducted to demonstrate the effectiveness of our newly proposed components.
Researcher Affiliation Collaboration 1 Harbin Institute of Technology 2 City University of Hong Kong 3 Huawei Noah s Ark Lab
Pseudocode No The paper describes methods using equations and textual explanations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/tyhuang0428/Dream Physics
Open Datasets Yes Dataset. We collect seven 3D static scenes or objects from previous works (Xie et al. 2023; Zhang et al. 2024) and 3D GS generative models (Tang et al. 2024).
Dataset Splits No The paper mentions collecting 3D static scenes and using VBench for evaluation, but it does not specify any training/test/validation splits or other dataset partitioning strategies for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software components like 'warp', 'Model Scope', 'Stable Video Diffusion', 'KAN-based', and 'LAION aesthetic predictor' but does not specify their version numbers or the versions of general programming languages or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes For most simulation scenes, we set the simulation duration as 5 10 5 second and the frame duration as 4 10 2 second. Thus, we simulate 800 steps between every two renderings and include the simulation gradient of the last step in the optimization. The numbers of their generated video frames T are 16 and 25, respectively. For frame boosting, we set M = 5, boosting the video slices to 5 groups. The setting of MDS follows SDS, where CFG value is set to 100. We stop the training if optimized parameter values stabilize within one order of magnitude. The training process requires around 30 iterations.