Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Authors: Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Scan Net, Scan Net200, Scene NN and 3RScan show our method achieves state-of-the-art performance among online 3D perception models, even outperforming offline VFM-assisted 3D instance segmentation methods by a large margin. |
| Researcher Affiliation | Academia | 1Tsinghua University, 2Nanyang Technological University |
| Pseudocode | No | The paper describes methods with textual explanations and mathematical formulas (Eq 1-9) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available1. 1Project page: https://xuxw98.github.io/ESAM/ |
| Open Datasets | Yes | We evaluate our method on four datasets: Scan Net Dai et al. (2017), Scan Net200 Rozenberszki et al. (2022), Scene NN Hua et al. (2016) and 3RScan Wald et al. (2019). |
| Dataset Splits | Yes | Scan Net contains 1513 scanned scenes, out of which we use 1201 sequences for training and the rest 312 for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It mentions 'VFM' and '3D U-Net' but not the underlying hardware. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | For hyperparameters, we set ϕ = 0.5, ϵ = 1.75, τ = 0.02, α = 0.5 and β = 0.5. In the dual-level query decoder, we actually set F = FS for the first two iterations of mask prediction, and then set F = FP . |