econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
Authors: Can Zhang, Gim H Lee
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a series of experiments to demonstrate the effectiveness of our proposed method across various 3D scene understanding tasks. We evaluate our method on the 2D semantic segmentation benchmarks: Scan Net (Dai et al., 2017) and Replica (Straub et al., 2019), and 3D open-vocabulary segmentation benchmarks: LERF (Kerr et al., 2023) and 3DOVS (Liu et al., 2024) to compare with previous work, and provide results from ablation studies. |
| Researcher Affiliation | Academia | Can Zhang & Gim Hee Lee Department of Computer Science National University of Singapore EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology and components (CRR, Low-Dimensional 3D Contextual Space, 3DGS Semantic Fields) using prose and mathematical equations. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Our source code is available at: https://lulusindazc.github.io/econ SGproject/. |
| Open Datasets | Yes | We evaluate our method on the 2D semantic segmentation benchmarks: Scan Net (Dai et al., 2017) and Replica (Straub et al., 2019), and 3D open-vocabulary segmentation benchmarks: LERF (Kerr et al., 2023) and 3DOVS (Liu et al., 2024). |
| Dataset Splits | Yes | For both Scan Net and Replica, we construct training and test sets by evenly sampling sequences in each scene. ... For LERF and 3DOVS, we follow the settings in Lang Splat (Qin et al., 2023) where LERF is extended with ground truth masks annotated for language queries and 3DOVS consists of 20-30 images for each scene with the resolution of 4032x3024. ... We also perform robustness comparison by evenly sampling sparse training views for optimization(30 images per-scene in our experiments). |
| Hardware Specification | Yes | For all datasets, we train each scene for 30K iterations on one NVIDIA RTX-4090 GPU. |
| Software Dependencies | No | The paper mentions using 'Open Seg (Ghiasi et al., 2022)', 'LSeg(Li et al., 2022)', 'Openclip (Ilharco et al., 2021)', 'SAM' and 'Adam optimizer'. However, no specific version numbers for these software components or any programming language environments (e.g., Python, PyTorch versions) are provided. |
| Experiment Setup | Yes | We then use SAM for mutual refinement with the 2D VLMs in our CRR to get the semantic features where we set τ1 = 0.45, τ2 = 0.6. We use the Adam optimizer with the learning rate 0.0025 for latent semantic fields. For parameters to train the image scene, we follow the default setting in the original 3DGS (Kerbl et al., 2023). For additional parameters introduced to train the semantic scene, we set λsem = 1, λ2d = 1. For all datasets, we train each scene for 30K iterations on one NVIDIA RTX-4090 GPU. |