reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Authors: Hongyin Zhang, Zifeng Zhuang, Han Zhao, Pengxiang Ding, Hongchao Lu, Donglin Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Reinbo T achieves state-of-the-art performance on the CALVIN mixed-quality dataset and exhibits superior few-shot learning and out-of-distribution generalization capabilities in real-world tasks.
Researcher Affiliation	Academia	1Zhejiang University, Hangzhou, China 2Westlake University, Hangzhou, China. Correspondence to: Donglin Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Reinbo T: Test-time Execution
Open Source Code	No	The paper does not contain an unambiguous statement of code release or a link to a code repository for the described methodology.
Open Datasets	Yes	We first construct a mixed-quality dataset based on CALVIN (Mees et al., 2022)... We initialize its weights with the pre-trained model weights, which are derived from the generated video pre-training on the Ego4d (Grauman et al., 2022) dataset consistent with GR-1.
Dataset Splits	Yes	This dataset contains a small amount of data with language instructions in CALVIN ABC (about 50 trajectories per task) and a large amount of autonomous data without language instructions. In addition to the original data collected by human teleoperation without language instructions in CALVIN (more than 20,000 trajectories), the autonomous data also contains failure data generated by the interaction between the trained VLA behavioral policy Robo Flamingo (Li et al., b) and the environment CALVIN D (more than 10,000 trajectories). We study training on this mixed-quality data, then fine-tune a small amount of data with language instructions, and finally test the generalization performance on CALVIN D.
Hardware Specification	No	The paper mentions conducting real-world tasks on a 'robotic arm UR5', which is a robot platform, but does not provide specific details about the computational hardware (e.g., GPU, CPU models) used for training or running the experiments.
Software Dependencies	No	The paper mentions using 'GPT2 (Radford et al., 2019) structure' and 'Optimizer Adam (Kingma, 2014)', but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 3. Network hyperparameters configuration. Table 4. Training hyperparameters configuration. Parameter Value Return To Go loss weight λ 0.001 Expectile regression parameter m 0.9 Gradient clip 1.0 Epochs 50 Warm-up epochs 1 Batch size 32 Learning rate 0.001 Weight decay 0.01 Dropout rate 0.1 Reward weight w4 i=1 0.1, 0.1, 0.01, 0.1 Optimizer Adam (β1 = 0.9, β2 = 0.999)