GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model

Authors: Hongliang Zeng, Ping Zhang, Fang Li, QinPeng Yi, Tingyu Ye, Jiahua Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that GMAP achieves state-of-the-art (SOTA) performance in both the perception and manipulation of articulated objects and adapts to real-world scenarios. We conducted comprehensive experimental validation on two widely recognized articulated object datasets Part Net Mobility (Mo et al. 2019) and Shape2Motion (Wang et al. 2019). GMAP achieved an 80% Intersection over Union (Io U) in the part segmentation task, while in the joint orientation prediction task, the prediction error was maintained at approximately 0.42 . Additionally, we achieved a 36.94% success rate in instruction-based push manipulation in the Sapien (Xiang et al. 2020) simulator and successfully manipulated three different types of articulated objects in real-world environments.
Researcher Affiliation Academia Hongliang Zeng, Ping Zhang*, Fang Li, Qinpeng Yi, Tingyu Ye, Jiahua Wang South China University of Technology EMAIL, EMAIL
Pseudocode No The paper describes the method using prose, mathematical formulas, and diagrams (Fig. 2, Fig. 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/robhlzeng/GMAP
Open Datasets Yes We conducted comprehensive experimental validation on two widely recognized articulated object datasets Part Net Mobility (Mo et al. 2019) and Shape2Motion (Wang et al. 2019). ... We pre-trained the MSFE module on Shape Net (Chang et al. 2015) to boost its 3D shape feature extraction. ... We also tested our method s manipulation planning on Part Net-Mobility instances in the Sapien (Xiang et al. 2020).
Dataset Splits Yes Using a depth camera, we collected point cloud data in various states, dividing it 9:1:1 for training, validation, and testing.
Hardware Specification No As shown in the top left corner of Fig. 5, our real-world experimental setup is equipped with a Real Sense2 RGB-D camera, which is used to capture depth images of objects. The paper mentions a specific camera used for data collection, but does not provide specific details about the computational hardware (CPU/GPU models, memory) used for training or inference of the models.
Software Dependencies No The paper mentions using Adam W as an optimizer and Point-MGE for pre-training, but it does not specify version numbers for any software libraries, programming languages (e.g., Python), or frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes For MSFE, we set three scales with point patches M = {512, 256, 64} and points per patch K = {32, 8, 8}. Using Adam W (Loshchilov and Hutter 2017), we pre-trained for 300 epochs as Point-MGE (Zeng et al. 2024a). To deepen the model s grasp of articulated objects, we added 100 epochs post-training on an articulated object dataset.