reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Authors: Guangqi Jiang, Yifei Sun, Tao Huang, Huanyu Li, Yongyuan Liang, Huazhe Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on both simulations and real robot validate the superiority of our proposed representation. Empirical results across four simulation domains with 20 robotic manipulation tasks demonstrate that MCR outperforms the strongest baseline by 14.8%. Additionally, MCR significantly boosts the success rate in three real-world manipulation tasks by 76.9%.
Researcher Affiliation	Academia	1 University of California, San Diego 2 Tongji University 3 Shanghai Jiao Tong University 4 University of Maryland, College Park 5 Tsinghua University
Pseudocode	No	The paper describes the proposed method MCR, including objective functions (Equations 1-4) and illustrations (Figure 5), as well as PyTorch-like implementation details in Appendix C.4, but does not include a clearly labeled pseudocode or algorithm block for the overall method.
Open Source Code	Yes	Project website: robots-pretrain-robots.github.io.
Open Datasets	Yes	Specifically, we pre-train a visual encoder on the DROID (Khazatsky et al., 2024) robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. ... We select a total of 20 tasks across 4 simulation environments... Robomimic (Mandlekar et al., 2021)... Robo Casa (Nasiriany et al., 2024)... Meta World (Yu et al., 2019)... Dex Art (Bao et al., 2023)...
Dataset Splits	No	For the DROID dataset used for pre-training, the paper states 'After processing, we retain 36k trajectories for pre-training,' without specifying any train/validation/test splits. For downstream tasks, it specifies the number of demonstrations used for training (e.g., '200 demonstrations' for Robomimic, '50 demonstrations' for Robo Casa), and mentions evaluating success rate in 'at least 20 episodes,' but does not provide explicit numerical splits of these demonstration datasets into training, validation, and testing sets.
Hardware Specification	Yes	The whole training process is used 50 hours with a single NVIDIA 3090.
Software Dependencies	No	The paper mentions 'Our codebase is built upon the implementation of R3M' and 'We utilize the Py Torch-Grad-CAM library to generate Grad-CAM figures,' but does not provide specific version numbers for PyTorch, PyTorch-Grad-CAM, or other software libraries used.
Experiment Setup	Yes	We show our hyperparameters during the pre-training stage in Table 6. Hyperparameter Value: Encoder type ResNet50, Batch size 32, Learning rate 1e-4, Training steps 500,000, Optimizer Adam. Downstream policy learning settings are introduced in Section C.5. For example, 'The downstream BC policy is a three-layer MLP with ReLu activations and hidden sizes of 256. ... trained with a mean squared error (MSE) loss, a learning rate of 0.001, and a batch size of 256.'