Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Authors: Yang Tian, Sizhe Yang, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong, Jiangmiao Pang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on both simulation and real-world benchmarks. On two widely adopted simulation benchmarks, LIBERO-LONG (Liu et al., 2024) (10 tasks) and CALVIN ABCD (Mees et al., 2022) (34 tasks), our method demonstrates a 10.4% improvement in success rate and a 0.75 increase in average task completion length compared to state-of-the-art baselines.
Researcher Affiliation Collaboration 1 Shanghai AI Laboratory 2 CFCS, School of CS, Peking University 3 National Engineering Research Center for Software Engineering, Peking University 4 School of Software & Microelectronics, Peking University 5 Key Laboratory of High Confidence Software Technologies (PKU), Ministry of Education 6 Chinese University of Hong Kong
Pseudocode No The paper describes its methodology using prose and architectural diagrams (Figure 2, Figure A-1) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and models are publicly available at https://github.com/Open Robot Lab/Seer/
Open Datasets Yes It is initially pretrained on large-scale robotic datasets, such as DROID, and can be adapted to realworld scenarios with a little fine-tuning data. We conduct experiments on two simulation benchmarks LIBERO-LONG (Liu et al., 2024), CALVIN ABC-D (Mees et al., 2022).
Dataset Splits Yes For pre-training, we utilize the official robot play data with no language instructions, while the remaining data with full annotations is used for fine-tuning. Evaluation is conducted in Environment D, which differs visually from Environments A, B, and C where the training data was collected. In the fine-tuning phase, we capture RGB images, robot states, and actions at 15 Hz, collecting 100 demonstrations per task. The results, shown in Figure 3, demonstrate that our method consistently enhances policy performance across varying data sizes. Notably, under data-scarce conditions with only 10% of the training data, the pre-trained policy achieves a 187% relative improvement in success rate on LIBERO-LONG and a 150% relative improvement in average task length on CALVIN ABC-D compared to training from scratch.
Hardware Specification Yes For all simulation results, we use eight 4090 GPUS to pre-train and fine-tune.
Software Dependencies No No specific software dependencies with version numbers (like Python, PyTorch, or CUDA versions) are mentioned in the paper.
Experiment Setup Yes Table A-I: Training hyperparameters. Batch Size 640 (LIBERO & CALVIN) / 2048 (Real) 512; Learning Rate 1e-4 1e-3; Optimizer Adam W Adam W; Learning Rate Schedule Cosine decay Cosine decay; Training Epochs 30 (LIBERO & Real) / 20 (CALVIN) 40 (LIBERO & Real) / 20 (CALVIN); History Length 7 (LIBERO & Real) / 10 (CALVIN) 7 (LIBERO & Real) / 10 (CALVIN); Action Chunk Length 3 3.