Data Scaling Laws in Imitation Learning for Robotic Manipulation
Authors: Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy s generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. |
| Researcher Affiliation | Academia | Fanqi Lin1,2,3 Yingdong Hu1,2,3 Pingyue Sheng1 Chuan Wen1,2,3 Jiacheng You1 Yang Gao1,2,3 1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai Artificial Intelligence Laboratory |
| Pseudocode | No | The paper describes the methodologies in prose and does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://data-scaling-laws.github.io/. To further support researchers in this endeavor, we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems. |
| Open Datasets | Yes | Existing robotic manipulation datasets do not provide enough environments and objects for a single task to meet our requirements. Therefore, we opt to use the Universal Manipulation Interface (UMI) (Chi et al., 2024), a hand-held gripper, to independently collect a substantial number of demonstrations. ... we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems. |
| Dataset Splits | Yes | To evaluate the generalization performance of the policy, we exclusively test it in unseen environments or with unseen objects. ... In total, 21 policies are trained, and each is evaluated using 8 unseen objects in the same environment as the training data, with 5 trials per object. ... Each policy is evaluated in 8 unseen environments using the same object as in training, with 5 trials per environment. ... Each policy is evaluated in 8 unseen environments, using two unseen objects per environment, with 5 trials per environment. ... To calculate MSE, we collect 30 human demonstrations for each evaluation environment or object, forming the validation set. |
| Hardware Specification | Yes | Policy inference is performed on a workstation equipped with an NVIDIA 4090 GPU (24 GB VRAM). ... it takes 75 hours to complete on 8 A800 GPUs. |
| Software Dependencies | No | The paper mentions several techniques and models like Diffusion Policy, U-Net, DDIM, DINOv2, Image Net, Res Net, CLIP Vi T, ACT, and Lo RA, but it does not specify any software libraries or frameworks with explicit version numbers. |
| Experiment Setup | Yes | Specifically, the policy trained on the smallest dataset undergoes 800 epochs, totaling 5.3 104 training steps. The policy trained on the largest dataset undergoes 75 epochs, totaling 5 105 training steps, which takes 75 hours to complete on 8 A800 GPUs. ... Config Value Image observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Proprioception observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Action horizon 16 Observation resolution 224 224 Environment frequency 5 Optimizer Adam W Optimizer momentum β1, β2 = 0.95, 0.999 Learning rate for action diffusion model 3e-4 Learning rate for visual encoder 3e-5 Learning rate schedule cosine decay Batch size 256 Inference denoising iterations 16 Temporal ensemble steps 8 Temporal ensemble adaptation rate -0.01 |