SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding
Authors: Peng Ling, Tiao Tan, Jiaqi Lin, Wenming Yang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments demonstrate that SOVGaussian is able to reconstruct a superior scene representation from few-shot images, outperforming existing state-of-the-art methods and achieving significantly better performance on novel view language querying and synthesis. Experimental results demonstrate that our method out-performs existing state-of-the-art methods, achieving up to a 56.9% improvement in m Io U compared to Lang Splat on the 3DOVS dataset and up to a 36% improvement on the DTU dataset. Ablation Study: Here, we conduct ablations on the 3DOVS dataset to evaluate the performance increment contributed by each component, including open-vocabulary querying accuracy and synthesis quality from novel views. |
| Researcher Affiliation | Academia | Shenzhen International Graduate School, Tsinghua University EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical equations (e.g., Equations 1-16), but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Repository https://github.com/Brucess/SOVGaussian |
| Open Datasets | Yes | We evaluate our method on the 3DOVS (Liu et al. 2023) and DTU datasets (Aanæs et al. 2016). |
| Dataset Splits | Yes | Different from their vanilla pipelines that use all views (i.e., 35 for 3DOVS and 49 for DTU) for training, we use only 3 views and evaluate generalization on novel views. To ensure fair comparison, all methods are trained following the same sparse-view protocol as ours, using the same 3 input views, camera poses, and test views. View selection follows uniform sampling for 3DOVS and the protocol in (Li et al. 2024) for DTU. |
| Hardware Specification | Yes | We train for 20,000 iterations on the 3DOVS dataset and 6,000 iterations on the DTU dataset using a single RTX 3090, requiring approximately 1 hour and 25 minutes, respectively, using around 4GB of memory. |
| Software Dependencies | No | Our approach is based on 3DGS (Kerbl et al. 2023) and implemented by Py Torch. While PyTorch is mentioned, no specific version number is provided for it or any other software dependency. |
| Experiment Setup | Yes | We train for 20,000 iterations on the 3DOVS dataset and 6,000 iterations on the DTU dataset... We empirically set γ to 20, τ = 5% for LOP, and λ = 0.1 for the loss function. We set the interval for LOP to 1000 iterations... We further control hyperparameters such as learning rate and density increment percentage to enhance the baselines performance. |