SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding

Authors: Peng Ling, Tiao Tan, Jiaqi Lin, Wenming Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments demonstrate that SOVGaussian is able to reconstruct a superior scene representation from few-shot images, outperforming existing state-of-the-art methods and achieving significantly better performance on novel view language querying and synthesis. Experimental results demonstrate that our method out-performs existing state-of-the-art methods, achieving up to a 56.9% improvement in m Io U compared to Lang Splat on the 3DOVS dataset and up to a 36% improvement on the DTU dataset. Ablation Study: Here, we conduct ablations on the 3DOVS dataset to evaluate the performance increment contributed by each component, including open-vocabulary querying accuracy and synthesis quality from novel views.
Researcher Affiliation Academia Shenzhen International Graduate School, Tsinghua University EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations (e.g., Equations 1-16), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Repository https://github.com/Brucess/SOVGaussian
Open Datasets Yes We evaluate our method on the 3DOVS (Liu et al. 2023) and DTU datasets (Aanæs et al. 2016).
Dataset Splits Yes Different from their vanilla pipelines that use all views (i.e., 35 for 3DOVS and 49 for DTU) for training, we use only 3 views and evaluate generalization on novel views. To ensure fair comparison, all methods are trained following the same sparse-view protocol as ours, using the same 3 input views, camera poses, and test views. View selection follows uniform sampling for 3DOVS and the protocol in (Li et al. 2024) for DTU.
Hardware Specification Yes We train for 20,000 iterations on the 3DOVS dataset and 6,000 iterations on the DTU dataset using a single RTX 3090, requiring approximately 1 hour and 25 minutes, respectively, using around 4GB of memory.
Software Dependencies No Our approach is based on 3DGS (Kerbl et al. 2023) and implemented by Py Torch. While PyTorch is mentioned, no specific version number is provided for it or any other software dependency.
Experiment Setup Yes We train for 20,000 iterations on the 3DOVS dataset and 6,000 iterations on the DTU dataset... We empirically set γ to 20, τ = 5% for LOP, and λ = 0.1 for the loss function. We set the interval for LOP to 1000 iterations... We further control hyperparameters such as learning rate and density increment percentage to enhance the baselines performance.