reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians

Authors: Can Zhang, Gim H Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a series of experiments to demonstrate the effectiveness of our proposed method across various 3D scene understanding tasks. We evaluate our method on the 2D semantic segmentation benchmarks: Scan Net (Dai et al., 2017) and Replica (Straub et al., 2019), and 3D open-vocabulary segmentation benchmarks: LERF (Kerr et al., 2023) and 3DOVS (Liu et al., 2024) to compare with previous work, and provide results from ablation studies.
Researcher Affiliation	Academia	Can Zhang & Gim Hee Lee Department of Computer Science National University of Singapore EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology and components (CRR, Low-Dimensional 3D Contextual Space, 3DGS Semantic Fields) using prose and mathematical equations. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Our source code is available at: https://lulusindazc.github.io/econ SGproject/.
Open Datasets	Yes	We evaluate our method on the 2D semantic segmentation benchmarks: Scan Net (Dai et al., 2017) and Replica (Straub et al., 2019), and 3D open-vocabulary segmentation benchmarks: LERF (Kerr et al., 2023) and 3DOVS (Liu et al., 2024).
Dataset Splits	Yes	For both Scan Net and Replica, we construct training and test sets by evenly sampling sequences in each scene. ... For LERF and 3DOVS, we follow the settings in Lang Splat (Qin et al., 2023) where LERF is extended with ground truth masks annotated for language queries and 3DOVS consists of 20-30 images for each scene with the resolution of 4032x3024. ... We also perform robustness comparison by evenly sampling sparse training views for optimization(30 images per-scene in our experiments).
Hardware Specification	Yes	For all datasets, we train each scene for 30K iterations on one NVIDIA RTX-4090 GPU.
Software Dependencies	No	The paper mentions using 'Open Seg (Ghiasi et al., 2022)', 'LSeg(Li et al., 2022)', 'Openclip (Ilharco et al., 2021)', 'SAM' and 'Adam optimizer'. However, no specific version numbers for these software components or any programming language environments (e.g., Python, PyTorch versions) are provided.
Experiment Setup	Yes	We then use SAM for mutual refinement with the 2D VLMs in our CRR to get the semantic features where we set τ1 = 0.45, τ2 = 0.6. We use the Adam optimizer with the learning rate 0.0025 for latent semantic fields. For parameters to train the image scene, we follow the default setting in the original 3DGS (Kerbl et al., 2023). For additional parameters introduced to train the semantic scene, we set λsem = 1, λ2d = 1. For all datasets, we train each scene for 30K iterations on one NVIDIA RTX-4090 GPU.