GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
Authors: Zihui Zhang, Yafei YANG, Hongtao Wen, Bo Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our method on two real-world datasets and a newly created synthetic dataset, demonstrating remarkable segmentation performance, clearly surpassing all existing unsupervised methods. 4 EXPERIMENTS 4.1 EVALUATION ON SCANNET 4.4 ABLATION STUDY |
| Researcher Affiliation | Academia | Zihui Zhang 1,2 Yafei Yang 1,2 Hongtao Wen 1,2 Bo Yang 1,2 1 Shenzhen Research Institute, The Hong Kong Polytechnic University 2 v LAR Group, The Hong Kong Polytechnic University EMAIL EMAIL |
| Pseudocode | No | The paper describes methods and steps in numbered lists within paragraph text (e.g., in Section 3.3 under 'Object Discovery Branch as an Embodied Agent' and 'Reward Design'), but it does not present any formal pseudocode blocks or algorithms explicitly labeled as such. |
| Open Source Code | Yes | Our code and data are available at https://github.com/vLAR-group/GrabS |
| Open Datasets | Yes | We extensively evaluate our method on two real-world datasets and a newly created synthetic dataset... Datasets: We evaluate on three datasets: 1) The challenging real-world Scan Net dataset (Dai et al., 2017), comprising 1201/312/100 indoor scenes for training/validation/test respectively; 2) The realworld S3DIS dataset (Armeni, 2017), including 6 areas of indoor scenes; 3) Our own synthetic dataset with 4000/1000 training/test scenes. ... Shape Net (Chang et al., 2015)... we have released it for future studies. |
| Dataset Splits | Yes | Scan Net dataset (Dai et al., 2017), comprising 1201/312/100 indoor scenes for training/validation/test respectively... Our own synthetic dataset with 4000/1000 training/test scenes. |
| Hardware Specification | Yes | The hardware for testing is a single RTX 3090 GPU with an AMD R9 5900X CPU. |
| Software Dependencies | No | The paper mentions software components such as PPO loss, Mask3D, Sparse Conv, Adam, Adam W, and transformer decoder, but it does not specify any version numbers for these software dependencies (e.g., PyTorch version, CUDA version, or specific library versions). |
| Experiment Setup | Yes | We train our model on the Scan Net training set for 450 epochs with a batch size of 8. The optimizer is Adam with a learning rate of 0.0001 in all training epochs. The optimizer is Adam W with a learning rate of 0.0001 in all training epochs. The Custom30M version of Sparse Conv (Choy et al., 2019) with a transformer decoder is chosen as the backbone and segmentation head. we use a 5cm voxel size. We train our model for 150 epochs on this dataset with a batch size of 10. |