GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model
Authors: Hongliang Zeng, Ping Zhang, Fang Li, QinPeng Yi, Tingyu Ye, Jiahua Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that GMAP achieves state-of-the-art (SOTA) performance in both the perception and manipulation of articulated objects and adapts to real-world scenarios. We conducted comprehensive experimental validation on two widely recognized articulated object datasets Part Net Mobility (Mo et al. 2019) and Shape2Motion (Wang et al. 2019). GMAP achieved an 80% Intersection over Union (Io U) in the part segmentation task, while in the joint orientation prediction task, the prediction error was maintained at approximately 0.42 . Additionally, we achieved a 36.94% success rate in instruction-based push manipulation in the Sapien (Xiang et al. 2020) simulator and successfully manipulated three different types of articulated objects in real-world environments. |
| Researcher Affiliation | Academia | Hongliang Zeng, Ping Zhang*, Fang Li, Qinpeng Yi, Tingyu Ye, Jiahua Wang South China University of Technology EMAIL, EMAIL |
| Pseudocode | No | The paper describes the method using prose, mathematical formulas, and diagrams (Fig. 2, Fig. 3), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/robhlzeng/GMAP |
| Open Datasets | Yes | We conducted comprehensive experimental validation on two widely recognized articulated object datasets Part Net Mobility (Mo et al. 2019) and Shape2Motion (Wang et al. 2019). ... We pre-trained the MSFE module on Shape Net (Chang et al. 2015) to boost its 3D shape feature extraction. ... We also tested our method s manipulation planning on Part Net-Mobility instances in the Sapien (Xiang et al. 2020). |
| Dataset Splits | Yes | Using a depth camera, we collected point cloud data in various states, dividing it 9:1:1 for training, validation, and testing. |
| Hardware Specification | No | As shown in the top left corner of Fig. 5, our real-world experimental setup is equipped with a Real Sense2 RGB-D camera, which is used to capture depth images of objects. The paper mentions a specific camera used for data collection, but does not provide specific details about the computational hardware (CPU/GPU models, memory) used for training or inference of the models. |
| Software Dependencies | No | The paper mentions using Adam W as an optimizer and Point-MGE for pre-training, but it does not specify version numbers for any software libraries, programming languages (e.g., Python), or frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For MSFE, we set three scales with point patches M = {512, 256, 64} and points per patch K = {32, 8, 8}. Using Adam W (Loshchilov and Hutter 2017), we pre-trained for 300 epochs as Point-MGE (Zeng et al. 2024a). To deepen the model s grasp of articulated objects, we added 100 epochs post-training on an articulated object dataset. |