Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Authors: Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Scan Net, Scan Net200, Scene NN and 3RScan show our method achieves state-of-the-art performance among online 3D perception models, even outperforming offline VFM-assisted 3D instance segmentation methods by a large margin.
Researcher Affiliation Academia 1Tsinghua University, 2Nanyang Technological University
Pseudocode No The paper describes methods with textual explanations and mathematical formulas (Eq 1-9) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available1. 1Project page: https://xuxw98.github.io/ESAM/
Open Datasets Yes We evaluate our method on four datasets: Scan Net Dai et al. (2017), Scan Net200 Rozenberszki et al. (2022), Scene NN Hua et al. (2016) and 3RScan Wald et al. (2019).
Dataset Splits Yes Scan Net contains 1513 scanned scenes, out of which we use 1201 sequences for training and the rest 312 for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It mentions 'VFM' and '3D U-Net' but not the underlying hardware.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes For hyperparameters, we set ϕ = 0.5, ϵ = 1.75, τ = 0.02, α = 0.5 and β = 0.5. In the dual-level query decoder, we actually set F = FS for the first two iterations of mask prediction, and then set F = FP .