Data Scaling Laws in Imitation Learning for Robotic Manipulation

Authors: Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy s generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects.
Researcher Affiliation Academia Fanqi Lin1,2,3 Yingdong Hu1,2,3 Pingyue Sheng1 Chuan Wen1,2,3 Jiacheng You1 Yang Gao1,2,3 1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai Artificial Intelligence Laboratory
Pseudocode No The paper describes the methodologies in prose and does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://data-scaling-laws.github.io/. To further support researchers in this endeavor, we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems.
Open Datasets Yes Existing robotic manipulation datasets do not provide enough environments and objects for a single task to meet our requirements. Therefore, we opt to use the Universal Manipulation Interface (UMI) (Chi et al., 2024), a hand-held gripper, to independently collect a substantial number of demonstrations. ... we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems.
Dataset Splits Yes To evaluate the generalization performance of the policy, we exclusively test it in unseen environments or with unseen objects. ... In total, 21 policies are trained, and each is evaluated using 8 unseen objects in the same environment as the training data, with 5 trials per object. ... Each policy is evaluated in 8 unseen environments using the same object as in training, with 5 trials per environment. ... Each policy is evaluated in 8 unseen environments, using two unseen objects per environment, with 5 trials per environment. ... To calculate MSE, we collect 30 human demonstrations for each evaluation environment or object, forming the validation set.
Hardware Specification Yes Policy inference is performed on a workstation equipped with an NVIDIA 4090 GPU (24 GB VRAM). ... it takes 75 hours to complete on 8 A800 GPUs.
Software Dependencies No The paper mentions several techniques and models like Diffusion Policy, U-Net, DDIM, DINOv2, Image Net, Res Net, CLIP Vi T, ACT, and Lo RA, but it does not specify any software libraries or frameworks with explicit version numbers.
Experiment Setup Yes Specifically, the policy trained on the smallest dataset undergoes 800 epochs, totaling 5.3 104 training steps. The policy trained on the largest dataset undergoes 75 epochs, totaling 5 105 training steps, which takes 75 hours to complete on 8 A800 GPUs. ... Config Value Image observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Proprioception observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Action horizon 16 Observation resolution 224 224 Environment frequency 5 Optimizer Adam W Optimizer momentum β1, β2 = 0.95, 0.999 Learning rate for action diffusion model 3e-4 Learning rate for visual encoder 3e-5 Learning rate schedule cosine decay Batch size 256 Inference denoising iterations 16 Temporal ensemble steps 8 Temporal ensemble adaptation rate -0.01