reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Authors: Marius Memmel, Jacob Berg, Bingqing Chen, Abhishek Gupta, Jonathan Francis

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	STRAP outperforms both prior retrieval algorithms and multi-task learning methods in simulated and real experiments, showing the ability to scale to much larger offline datasets in the real world as well as the ability to learn robust control policies with just a handful of real-world demonstrations.
Researcher Affiliation	Collaboration	Marius Memmel1, , Jacob Berg1, , Bingqing Chen2, Abhishek Gupta1, , Jonathan Francis2,3, 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Robot Learning Lab, Bosch Center for Artificial Intelligence 3Robotics Institute, Carnegie Mellon University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 STRAP (Dtarget, Dprior, K, ϵ, F)
Open Source Code	No	Project website at https://weirdlabuw.github.io/strap/. The paper mentions a project website, but does not explicitly state that the source code for the described methodology is available there. It also refers to third-party tools like 'robomimic' but not their own code.
Open Datasets	Yes	We demonstrate the efficacy of STRAP in simulation on the LIBERO benchmark (Liu et al., 2024), in two real-world scenarios following the DROID (Khazatsky et al., 2024) hardware setup. We initialize the Res Net-18 (He et al., 2015) vision encoders of our policy with weights pre-trained on Image Net (Deng et al., 2009).
Dataset Splits	Yes	LIBERO: We randomly sample 5 demonstrations from LIBERO-10 as Dtarget and utilize the 4500 trajectories in LIBERO-90 as Dprior. ... DROID-Kitchen: ...We collect 150 multi-task demonstrations across scenes in Dprior (Kitchen), with 50 per scene. ... Dtarget contains three demos of a single task per environment.
Hardware Specification	Yes	We benchmark Huggingface s DINOv2 implementation3 on an NVIDIA L40S 46GB using batch size 32. ... Training a single policy takes 35min 4 (average over 10 trials) on an NVIDIA L40S 46GB.
Software Dependencies	No	The paper mentions using 'Huggingface s DINOv2 implementation', 'numba', and 'robomimic' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	All results are reported over 3 training and evaluation seeds (1234, 42, 4325). We fixed both the number of segments retrieved to 100, the camera viewpoint to the agent view image for retrieval, and the number of expert demonstrations to 5. ... Our transformer policy was trained for 300 epochs with batch size 32 and an epoch every 200 gradient steps.