reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Authors: Vaibhav Saxena, Matthew Bronars, Nadun Ranawaka Arachchige, Kuancheng Wang, Woo Shin, Soroush Nasiriany, Ajay Mandlekar, Danfei Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we conduct a large-scale dataset composition study to answer this question. Our experiments with Mimic Labs yield several critical insights for example, we find that camera poses and spatial arrangements are crucial dimensions for both diversity in collection and alignment in retrieval. In real-world robot learning settings, we find that not only do our insights from simulation carry over, but our retrieval strategies on existing datasets such as DROID allow us to consistently outperform existing training strategies by up to 70%.
Researcher Affiliation	Collaboration	Vaibhav Saxena1, Matthew Bronars1 , Nadun Ranawaka Arachchige1 , Kuancheng Wang1, Woo Chul Shin1, Soroush Nasiriany2, Ajay Mandlekar3 , Danfei Xu1,3 1Georgia Institute of Technology 2The University of Texas at Austin 3NVIDIA
Pseudocode	No	The paper describes methods and processes textually, such as procedural task and demonstration generation, but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper mentions 'More results at https://robo-mimiclabs.github.io/', which is a project website. It does not explicitly state that the source code for the methodology described in the paper is available at this link or elsewhere, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Recent large-scale efforts such as DROID (Khazatsky et al., 2024) and the Open X Embodiment datasets (Collaboration et al., 2023) have amassed millions of trajectories for table-top manipulation tasks.
Dataset Splits	Yes	All experiments use 10 target demos and 1000 co-training demos. We include highlight results in Table 2, with full results in Fig. 13. ... All our co-training experiments used 10 target demonstrations and 1000 co-training demonstrations (unless otherwise specified). We evaluate checkpoints every 50 epochs and report the peak success rate. Each evaluation is obtained by rolling out the policy 30 times.
Hardware Specification	No	The paper describes the robot hardware and sensors used for data collection and experiments (e.g., 'Franka robotic arm with a Robotiq gripper', 'Zed 2 stereo cameras', 'Real Sense D435 cameras'), but it does not specify any computing hardware like GPU models, CPU models, or cloud computing resources used for training the models.
Software Dependencies	No	The paper mentions using BC-RNN and Diffusion Policy for training, ResNet18 as a visual backbone, and the Adam optimizer. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	We train all models for simulation for 500 epochs (250k gradient steps) using the Adam optimizer (Kingma & Ba, 2014) using learning rate 1e 4. All our co-training experiments used 10 target demonstrations and 1000 co-training demonstrations (unless otherwise specified). We evaluate checkpoints every 50 epochs and report the peak success rate. Each evaluation is obtained by rolling out the policy 30 times. Table 4: BC-RNN Policy Hyperparameters for Simulation Experiments (batch size 32, sequence length 10, optimizer Adam, learning rate 1e-4, image encoder Res Net18, image resolution (128, 128)). Table 6: Diffusion Policy Hyperparameters for Real Robot Experiments (batch size 128, observation horizon (To) 2, action horizon (Ta) 8, prediction horizon (Tp) 16, diffusion method DDIM, optimizer Adam, learning rate 1e-4, image encoder Res Net18, image resolution (128, 128)).