MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
Authors: Qingming LIU, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, Junhui Hou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate that Mo DGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms state-of-the-art methods by a significant margin. Project page: https://Mo DGS.github.io |
| Researcher Affiliation | Academia | 1City University of Hong Kong 2HKUST 3HKU 4TAMU 5CUHK(SZ) EMAIL EMAIL |
| Pseudocode | No | The paper describes the proposed method in Section 3 and its subsections, using explanatory text and figures (Figure 2: Overview, Figure 3: Initialization of deformation field and Gaussians), but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: https://Mo DGS.github.io. This is a project page, which is considered a high-level overview page rather than a specific code repository. |
| Open Datasets | Yes | We conducted experiments on four datasets to demonstrate the effectiveness of our method. The first dataset is the Dy Ne RF (Li et al., 2022) dataset which consists of 6 scenes... The second dataset is the Nvidia (Yoon et al., 2020) dataset... We also present results of the Davis dataset (Pont Tuset et al., 2017) in Sec. A.6 of the appendix. |
| Dataset Splits | Yes | We use camera0 for training and evaluate the results on camera5 and camera6. The second dataset is the Nvidia (Yoon et al., 2020) dataset... We train all methods on camera4 and evaluate with camera3 and camera5. |
| Hardware Specification | Yes | The whole training takes around 3.5 hours to converge (2 hours for the initialization and 1.5 hours for the subsequent optimization) on an NVIDIA RTX A6000 GPU, which uses about 14G memory. |
| Software Dependencies | No | We implement our Mo DGS with Py Torch. To initialize the deformation field, we train it with 20k steps as stated in Sec. 3.2. Subsequently, we jointly train the 3D Gaussians and the deformation field with the rendering loss and the ordinal depth loss for another 20k steps. In Sec. 3.2, the flow is computed in evenly sampled key frames(e.g., 1/5). And the downsampling voxel size for Gaussian initialization is 0.0043 (scenes are normalized to [ 1, 1]3 ). For the outer optimization loop and rendering loss, we exactly follow the original 3DGS. And we use Gaussian centers to render depth (Yang et al., 2023). We adopt an Adam optimizer for optimization. |
| Experiment Setup | Yes | To initialize the deformation field, we train it with 20k steps as stated in Sec. 3.2. Subsequently, we jointly train the 3D Gaussians and the deformation field with the rendering loss and the ordinal depth loss for another 20k steps... The learning rate for 3D Gaussians exactly follows the official implementation of 3D GS (Kerbl et al., 2023), while the learning rate of the deformation network undergoes exponential decay from 1e-3 to 1e-4 in initialization and from 1e-4 to 1e-6 in the subsequent optimization. We set α = 100 for ℓordinal. The weight of our depth order loss is 0.1. When computing depth ordinal loss, we first normalize the depth range to [0,1] and we only consider the depth pair with a difference larger than 0.02 for loss computation. |