reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Authors: Qingming LIU, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, Junhui Hou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments demonstrate that Mo DGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms state-of-the-art methods by a significant margin. Project page: https://Mo DGS.github.io
Researcher Affiliation	Academia	1City University of Hong Kong 2HKUST 3HKU 4TAMU 5CUHK(SZ) EMAIL EMAIL
Pseudocode	No	The paper describes the proposed method in Section 3 and its subsections, using explanatory text and figures (Figure 2: Overview, Figure 3: Initialization of deformation field and Gaussians), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Project page: https://Mo DGS.github.io. This is a project page, which is considered a high-level overview page rather than a specific code repository.
Open Datasets	Yes	We conducted experiments on four datasets to demonstrate the effectiveness of our method. The first dataset is the Dy Ne RF (Li et al., 2022) dataset which consists of 6 scenes... The second dataset is the Nvidia (Yoon et al., 2020) dataset... We also present results of the Davis dataset (Pont Tuset et al., 2017) in Sec. A.6 of the appendix.
Dataset Splits	Yes	We use camera0 for training and evaluate the results on camera5 and camera6. The second dataset is the Nvidia (Yoon et al., 2020) dataset... We train all methods on camera4 and evaluate with camera3 and camera5.
Hardware Specification	Yes	The whole training takes around 3.5 hours to converge (2 hours for the initialization and 1.5 hours for the subsequent optimization) on an NVIDIA RTX A6000 GPU, which uses about 14G memory.
Software Dependencies	No	We implement our Mo DGS with Py Torch. To initialize the deformation field, we train it with 20k steps as stated in Sec. 3.2. Subsequently, we jointly train the 3D Gaussians and the deformation field with the rendering loss and the ordinal depth loss for another 20k steps. In Sec. 3.2, the flow is computed in evenly sampled key frames(e.g., 1/5). And the downsampling voxel size for Gaussian initialization is 0.0043 (scenes are normalized to [ 1, 1]3 ). For the outer optimization loop and rendering loss, we exactly follow the original 3DGS. And we use Gaussian centers to render depth (Yang et al., 2023). We adopt an Adam optimizer for optimization.
Experiment Setup	Yes	To initialize the deformation field, we train it with 20k steps as stated in Sec. 3.2. Subsequently, we jointly train the 3D Gaussians and the deformation field with the rendering loss and the ordinal depth loss for another 20k steps... The learning rate for 3D Gaussians exactly follows the official implementation of 3D GS (Kerbl et al., 2023), while the learning rate of the deformation network undergoes exponential decay from 1e-3 to 1e-4 in initialization and from 1e-4 to 1e-6 in the subsequent optimization. We set α = 100 for ℓordinal. The weight of our depth order loss is 0.1. When computing depth ordinal loss, we first normalize the depth range to [0,1] and we only consider the depth pair with a difference larger than 0.02 for loss computation.