Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
Authors: Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, ERIC EATON
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive quantitative experiments on the standard Part Net Mobility dataset, ARTICULATE-ANYTHING substantially outperforms prior work, increasing the success rate from 8.7 12.2% to 75% and setting a new bar for state-of-the-art performance. |
| Researcher Affiliation | Academia | Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton University of Pennsylvania |
| Pseudocode | Yes | / Code 1: Joint failure attribution. / |
| Open Source Code | Yes | Full video demonstrations and source code are available on the website. |
| Open Datasets | Yes | Datasets: We use the Partnet-Mobility dataset (Mo et al., 2018) which includes human annotations for 2.3K objects, 1.9K revolute joints, and 7.6K prismatic joints. |
| Dataset Splits | Yes | We evaluate the performance of these five (in-distribution) and the remaining 41 (out-of-distribution) classes. |
| Hardware Specification | No | No specific hardware (GPU, CPU models, or detailed computer specifications) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions several software components like Google's Gemini Flash-1.5, Py Bullet, Sapien, Co Tracker, and Stable-Baselines3, but does not provide specific version numbers for these software libraries or frameworks, except for the VLM model name. |
| Experiment Setup | Yes | We use few-shot prompting with around 20 in-context examples. The position threshold is set to 50mm and the angular threshold to 0.25 radian ( 14.3 degree). This process terminates when the rating exceeds a threshold of 5. We train a Franka arm to perform four robotic manipulation tasks in the Robosuite simulator using PPO and our generated assets. The policy outputs joint and gripper positions. We train policies over 3 random seeds per task for 2 million environment steps using PPO in Stable-Baselines3 library Raffin et al. (2021). We randomize physics (friction, damping, frictionloss ect), objects scales and poses to obtain robust policies. |