reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael Ryoo

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments across multiple simulated and real-world tasks, we demonstrate that LLa RA achieves stateof-the-art performance while preserving the generalization capabilities of large language models.
Researcher Affiliation	Academia	1Stony Brook University 2University of Wisconsin-Madison
Pseudocode	No	The paper describes methods and data generation in text and uses diagrams (e.g., Fig. 1, Fig. 2, Fig. 4) to illustrate concepts and data formats. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures that detail a structured algorithm.
Open Source Code	Yes	The code, datasets, and pretrained models are available at https://github.com/Lost Xine/LLa RA.
Open Datasets	Yes	We employ VIMA-Bench (Jiang et al., 2023), a simulated table-top robot manipulation environment to evaluate VLMs trained by our instruction tuning dataset. The code, datasets, and pretrained models are available at https://github.com/Lost Xine/LLa RA.
Dataset Splits	Yes	We uniformly subsample the VIMA dataset (Jiang et al., 2023) to form three subsets with different sizes: VIMA-0.8k, VIMA-8k, and VIMA-80k where the number indicates the number of expert trajectories in the dataset. We train all methods on these three datasets and evaluate them with 3 levels of difficulties following the test protocol (L1 to L3).
Hardware Specification	No	In Appendix C.1 Environment Setting, the paper states: "We utilize an x Arm7 robot arm equipped with a gripper and a Logitech C140 RGB webcam positioned above the arm to gather observations." This describes the robot hardware for real-world experiments but does not provide specific details about the computational hardware (e.g., GPU models, CPU types, memory) used for training or running the models.
Software Dependencies	No	The paper mentions using a 'pretrained LLa VA-1.5-7B (Liu et al., 2024b) model' and frameworks/models like 'GPT-4 (Open AI, 2023)' and 'OWLv2 (Minderer et al., 2024)'. However, it does not provide specific version numbers for underlying software dependencies such as Python, PyTorch, CUDA, or other libraries that would be necessary to replicate the experiment environment.
Experiment Setup	Yes	The training settings closely align with those of the original LLa VA stage 2. Specifically, we utilize a single-cycle cosine annealing scheduling with 0.03 warm-up ratio and a maximum learning rate of 2 10 5. However, for VIMA-0.8k and VIMA-8k, we employ a batch size of 32, whereas for VIMA-80k, we restore the batch size to 128.