What Matters in Learning from Large-Scale Datasets for Robot Manipulation
Authors: Vaibhav Saxena, Matthew Bronars, Nadun Ranawaka Arachchige, Kuancheng Wang, Woo Shin, Soroush Nasiriany, Ajay Mandlekar, Danfei Xu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we conduct a large-scale dataset composition study to answer this question. Our experiments with Mimic Labs yield several critical insights for example, we find that camera poses and spatial arrangements are crucial dimensions for both diversity in collection and alignment in retrieval. In real-world robot learning settings, we find that not only do our insights from simulation carry over, but our retrieval strategies on existing datasets such as DROID allow us to consistently outperform existing training strategies by up to 70%. |
| Researcher Affiliation | Collaboration | Vaibhav Saxena1, Matthew Bronars1 , Nadun Ranawaka Arachchige1 , Kuancheng Wang1, Woo Chul Shin1, Soroush Nasiriany2, Ajay Mandlekar3 , Danfei Xu1,3 1Georgia Institute of Technology 2The University of Texas at Austin 3NVIDIA |
| Pseudocode | No | The paper describes methods and processes textually, such as procedural task and demonstration generation, but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper mentions 'More results at https://robo-mimiclabs.github.io/', which is a project website. It does not explicitly state that the source code for the methodology described in the paper is available at this link or elsewhere, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Recent large-scale efforts such as DROID (Khazatsky et al., 2024) and the Open X Embodiment datasets (Collaboration et al., 2023) have amassed millions of trajectories for table-top manipulation tasks. |
| Dataset Splits | Yes | All experiments use 10 target demos and 1000 co-training demos. We include highlight results in Table 2, with full results in Fig. 13. ... All our co-training experiments used 10 target demonstrations and 1000 co-training demonstrations (unless otherwise specified). We evaluate checkpoints every 50 epochs and report the peak success rate. Each evaluation is obtained by rolling out the policy 30 times. |
| Hardware Specification | No | The paper describes the robot hardware and sensors used for data collection and experiments (e.g., 'Franka robotic arm with a Robotiq gripper', 'Zed 2 stereo cameras', 'Real Sense D435 cameras'), but it does not specify any computing hardware like GPU models, CPU models, or cloud computing resources used for training the models. |
| Software Dependencies | No | The paper mentions using BC-RNN and Diffusion Policy for training, ResNet18 as a visual backbone, and the Adam optimizer. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used. |
| Experiment Setup | Yes | We train all models for simulation for 500 epochs (250k gradient steps) using the Adam optimizer (Kingma & Ba, 2014) using learning rate 1e 4. All our co-training experiments used 10 target demonstrations and 1000 co-training demonstrations (unless otherwise specified). We evaluate checkpoints every 50 epochs and report the peak success rate. Each evaluation is obtained by rolling out the policy 30 times. Table 4: BC-RNN Policy Hyperparameters for Simulation Experiments (batch size 32, sequence length 10, optimizer Adam, learning rate 1e-4, image encoder Res Net18, image resolution (128, 128)). Table 6: Diffusion Policy Hyperparameters for Real Robot Experiments (batch size 128, observation horizon (To) 2, action horizon (Ta) 8, prediction horizon (Tp) 16, diffusion method DDIM, optimizer Adam, learning rate 1e-4, image encoder Res Net18, image resolution (128, 128)). |