Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Authors: Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, ERIC EATON

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive quantitative experiments on the standard Part Net Mobility dataset, ARTICULATE-ANYTHING substantially outperforms prior work, increasing the success rate from 8.7 12.2% to 75% and setting a new bar for state-of-the-art performance.
Researcher Affiliation Academia Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton University of Pennsylvania
Pseudocode Yes / Code 1: Joint failure attribution. /
Open Source Code Yes Full video demonstrations and source code are available on the website.
Open Datasets Yes Datasets: We use the Partnet-Mobility dataset (Mo et al., 2018) which includes human annotations for 2.3K objects, 1.9K revolute joints, and 7.6K prismatic joints.
Dataset Splits Yes We evaluate the performance of these five (in-distribution) and the remaining 41 (out-of-distribution) classes.
Hardware Specification No No specific hardware (GPU, CPU models, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies No The paper mentions several software components like Google's Gemini Flash-1.5, Py Bullet, Sapien, Co Tracker, and Stable-Baselines3, but does not provide specific version numbers for these software libraries or frameworks, except for the VLM model name.
Experiment Setup Yes We use few-shot prompting with around 20 in-context examples. The position threshold is set to 50mm and the angular threshold to 0.25 radian ( 14.3 degree). This process terminates when the rating exceeds a threshold of 5. We train a Franka arm to perform four robotic manipulation tasks in the Robosuite simulator using PPO and our generated assets. The policy outputs joint and gripper positions. We train policies over 3 random seeds per task for 2 million environment steps using PPO in Stable-Baselines3 library Raffin et al. (2021). We randomize physics (friction, damping, frictionloss ect), objects scales and poses to obtain robust policies.