3D Vision-Language Gaussian Splatting

Authors: Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Chen Chen, Ziyan Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin.
Researcher Affiliation Collaboration 1Center for Research in Computer Vision, University of Central Florida, Orlando, FL, USA 2United Imaging Intelligence, Boston, MA, USA EMAIL, {first.last}@uii-ai.com, EMAIL
Pseudocode No The paper describes the methodology using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No We will also release the source code of our method upon acceptance of the paper.
Open Datasets Yes Datasets. We employ 3 datasets for our evaluation on open-vocabulary semantic tasks. (1) LERF dataset (Kerr et al., 2023)... (2) 3D-OVS dataset (Liu et al., 2023)... (3) Mip-Ne RF 360 dataset (Barron et al., 2022)... Additionally, all datasets used in this research are publicly available to the community.
Dataset Splits No The paper mentions using LERF, 3D-OVS, and Mip-Ne RF 360 datasets, and refers to evaluation protocols from other papers for metrics, but does not explicitly provide details on how these datasets were split into training, validation, or test sets.
Hardware Specification Yes All experiments are conducted on Nvidia A100 GPUs.
Software Dependencies No The implementation of semantic opacity is done in CUDA and C++, while the other components are in Py Torch.
Experiment Setup Yes For modality fusion, we set dc and df to 3, while dh is set to 4. During rasterization, smoothed semantic indicator is initialized in the same manner as color opacity. For each iteration, two camera views and their associated images are selected, and 3D Gaussians are trained for 15,000 iterations. Moreover, the parameter λ in Equation 12 is configured to 1.2. The learning rates applied to the different 3DGS attributes are provided in Tab. 7.