reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

Authors: Soumava Paul, Prakhar Kaushik, Alan Yuille

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on the Mip Ne RF360 and DL3DV-10K benchmark dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes. Our project page1 provides additional results, videos, and code. We compare GScenes with state-of-the-art pose-free and posed sparse-view reconstruction methods in Fig 9, 10 and Table 1, 2. We also ablate the diﬀerent components and design choices of our diﬀusion model.
Researcher Affiliation	Academia	Soumava Paul, Prakhar Kaushik, Alan Yuille CCVL, Johns Hopkins University EMAIL
Pseudocode	Yes	Algorithm 1 Gaussian Scenes Training
Open Source Code	Yes	Our project page1 provides additional results, videos, and code. 1https://gaussianscenes.github.io ... An open-source low-cost solution with lower data and compute requirements compared to state-of-the-art posed reconstruction methods.
Open Datasets	Yes	Evaluations on the Mip Ne RF360 and DL3DV-10K benchmark dataset demonstrate that our method surpasses existing pose-free techniques... We evaluate GScenes on the 9 scenes of the Mip Ne RF360 dataset (Barron et al., 2022), and 15 scenes (out of 140) of the DL3DV-10K benchmark dataset. ... We ﬁne-tune our diﬀusion model on a mix of 1043 scenes encompassing Tanks and Temples (Knapitsch et al., 2017), CO3D (Reizenstein et al., 2021), Deep Blending (Hedman et al., 2018), and the 1k subset of DL3DV-10K (Ling et al., 2024) to obtain a total of 171, 461 data samples.
Dataset Splits	Yes	For Mip Ne RF360, We pick the M-view splits as proposed by Recon Fusion and CAT3D and evaluate all baselines on the oﬃcial test views where every 8th image is held out for testing. For DL3DV-10K scenes, we create M-view splits using a greedy view-selection heuristic for maximizing scene coverage given a set of dense training views, similar to the heuristic proposed in Wu et al. (2024). For test views, we hold out every 8th image as in Mip Ne RF360. For a given scene, we ﬁt sparse models for M {3, 6, 9, 18} number of views.
Hardware Specification	Yes	GScenes is implemented in Py Torch 2.3.1 on single A5000/A6000 GPUs. Finetuning this model takes about 4-days on a single A6000 GPU. GScenes completes full 3D reconstruction in approximately 5 minutes on a single A6000 GPU.
Software Dependencies	Yes	GScenes is implemented in Py Torch 2.3.1 on single A5000/A6000 GPUs.
Experiment Setup	Yes	The diﬀusion model is ﬁnetuned for 100k iterations (batch size 16, learning rate 1e-4) with conditioning element dropout probability of 0.05 for CFG. Following Instant Splat, we ﬁt 3D Gaussians to sparse inputs and MASt3r point clouds for 1k iterations to obtain G. We use classiﬁer-free guidance scales s I = s C = 3.0 and sample with k = 20 DDIM steps. We linearly decay wd from 1 to 0.01 and Lsample weight from 1 to 0.1 over 10k iterations. We ﬁnetune this VAE on a subset of our dataset for 5000 training steps with batch size 16 and learning rate 1e-05.