Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Authors: Yang Tian, Sizhe Yang, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong, Jiangmiao Pang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on both simulation and real-world benchmarks. On two widely adopted simulation benchmarks, LIBERO-LONG (Liu et al., 2024) (10 tasks) and CALVIN ABCD (Mees et al., 2022) (34 tasks), our method demonstrates a 10.4% improvement in success rate and a 0.75 increase in average task completion length compared to state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | 1 Shanghai AI Laboratory 2 CFCS, School of CS, Peking University 3 National Engineering Research Center for Software Engineering, Peking University 4 School of Software & Microelectronics, Peking University 5 Key Laboratory of High Confidence Software Technologies (PKU), Ministry of Education 6 Chinese University of Hong Kong |
| Pseudocode | No | The paper describes its methodology using prose and architectural diagrams (Figure 2, Figure A-1) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are publicly available at https://github.com/Open Robot Lab/Seer/ |
| Open Datasets | Yes | It is initially pretrained on large-scale robotic datasets, such as DROID, and can be adapted to realworld scenarios with a little fine-tuning data. We conduct experiments on two simulation benchmarks LIBERO-LONG (Liu et al., 2024), CALVIN ABC-D (Mees et al., 2022). |
| Dataset Splits | Yes | For pre-training, we utilize the official robot play data with no language instructions, while the remaining data with full annotations is used for fine-tuning. Evaluation is conducted in Environment D, which differs visually from Environments A, B, and C where the training data was collected. In the fine-tuning phase, we capture RGB images, robot states, and actions at 15 Hz, collecting 100 demonstrations per task. The results, shown in Figure 3, demonstrate that our method consistently enhances policy performance across varying data sizes. Notably, under data-scarce conditions with only 10% of the training data, the pre-trained policy achieves a 187% relative improvement in success rate on LIBERO-LONG and a 150% relative improvement in average task length on CALVIN ABC-D compared to training from scratch. |
| Hardware Specification | Yes | For all simulation results, we use eight 4090 GPUS to pre-train and fine-tune. |
| Software Dependencies | No | No specific software dependencies with version numbers (like Python, PyTorch, or CUDA versions) are mentioned in the paper. |
| Experiment Setup | Yes | Table A-I: Training hyperparameters. Batch Size 640 (LIBERO & CALVIN) / 2048 (Real) 512; Learning Rate 1e-4 1e-3; Optimizer Adam W Adam W; Learning Rate Schedule Cosine decay Cosine decay; Training Epochs 30 (LIBERO & Real) / 20 (CALVIN) 40 (LIBERO & Real) / 20 (CALVIN); History Length 7 (LIBERO & Real) / 10 (CALVIN) 7 (LIBERO & Real) / 10 (CALVIN); Action Chunk Length 3 3. |