GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision

Authors: Zihui Zhang, Yafei YANG, Hongtao Wen, Bo Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method on two real-world datasets and a newly created synthetic dataset, demonstrating remarkable segmentation performance, clearly surpassing all existing unsupervised methods. 4 EXPERIMENTS 4.1 EVALUATION ON SCANNET 4.4 ABLATION STUDY
Researcher Affiliation Academia Zihui Zhang 1,2 Yafei Yang 1,2 Hongtao Wen 1,2 Bo Yang 1,2 1 Shenzhen Research Institute, The Hong Kong Polytechnic University 2 v LAR Group, The Hong Kong Polytechnic University EMAIL EMAIL
Pseudocode No The paper describes methods and steps in numbered lists within paragraph text (e.g., in Section 3.3 under 'Object Discovery Branch as an Embodied Agent' and 'Reward Design'), but it does not present any formal pseudocode blocks or algorithms explicitly labeled as such.
Open Source Code Yes Our code and data are available at https://github.com/vLAR-group/GrabS
Open Datasets Yes We extensively evaluate our method on two real-world datasets and a newly created synthetic dataset... Datasets: We evaluate on three datasets: 1) The challenging real-world Scan Net dataset (Dai et al., 2017), comprising 1201/312/100 indoor scenes for training/validation/test respectively; 2) The realworld S3DIS dataset (Armeni, 2017), including 6 areas of indoor scenes; 3) Our own synthetic dataset with 4000/1000 training/test scenes. ... Shape Net (Chang et al., 2015)... we have released it for future studies.
Dataset Splits Yes Scan Net dataset (Dai et al., 2017), comprising 1201/312/100 indoor scenes for training/validation/test respectively... Our own synthetic dataset with 4000/1000 training/test scenes.
Hardware Specification Yes The hardware for testing is a single RTX 3090 GPU with an AMD R9 5900X CPU.
Software Dependencies No The paper mentions software components such as PPO loss, Mask3D, Sparse Conv, Adam, Adam W, and transformer decoder, but it does not specify any version numbers for these software dependencies (e.g., PyTorch version, CUDA version, or specific library versions).
Experiment Setup Yes We train our model on the Scan Net training set for 450 epochs with a batch size of 8. The optimizer is Adam with a learning rate of 0.0001 in all training epochs. The optimizer is Adam W with a learning rate of 0.0001 in all training epochs. The Custom30M version of Sparse Conv (Choy et al., 2019) with a transformer decoder is chosen as the backbone and segmentation head. we use a 5cm voxel size. We train our model for 150 epochs on this dataset with a batch size of 10.