3D Vision-Language Gaussian Splatting
Authors: Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Chen Chen, Ziyan Wu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin. |
| Researcher Affiliation | Collaboration | 1Center for Research in Computer Vision, University of Central Florida, Orlando, FL, USA 2United Imaging Intelligence, Boston, MA, USA EMAIL, {first.last}@uii-ai.com, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will also release the source code of our method upon acceptance of the paper. |
| Open Datasets | Yes | Datasets. We employ 3 datasets for our evaluation on open-vocabulary semantic tasks. (1) LERF dataset (Kerr et al., 2023)... (2) 3D-OVS dataset (Liu et al., 2023)... (3) Mip-Ne RF 360 dataset (Barron et al., 2022)... Additionally, all datasets used in this research are publicly available to the community. |
| Dataset Splits | No | The paper mentions using LERF, 3D-OVS, and Mip-Ne RF 360 datasets, and refers to evaluation protocols from other papers for metrics, but does not explicitly provide details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | Yes | All experiments are conducted on Nvidia A100 GPUs. |
| Software Dependencies | No | The implementation of semantic opacity is done in CUDA and C++, while the other components are in Py Torch. |
| Experiment Setup | Yes | For modality fusion, we set dc and df to 3, while dh is set to 4. During rasterization, smoothed semantic indicator is initialized in the same manner as color opacity. For each iteration, two camera views and their associated images are selected, and 3D Gaussians are trained for 15,000 iterations. Moreover, the parameter λ in Equation 12 is configured to 1.2. The learning rates applied to the different 3DGS attributes are provided in Tab. 7. |