STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Authors: Marius Memmel, Jacob Berg, Bingqing Chen, Abhishek Gupta, Jonathan Francis
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | STRAP outperforms both prior retrieval algorithms and multi-task learning methods in simulated and real experiments, showing the ability to scale to much larger offline datasets in the real world as well as the ability to learn robust control policies with just a handful of real-world demonstrations. |
| Researcher Affiliation | Collaboration | Marius Memmel1, , Jacob Berg1, , Bingqing Chen2, Abhishek Gupta1, , Jonathan Francis2,3, 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Robot Learning Lab, Bosch Center for Artificial Intelligence 3Robotics Institute, Carnegie Mellon University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 STRAP (Dtarget, Dprior, K, ϵ, F) |
| Open Source Code | No | Project website at https://weirdlabuw.github.io/strap/. The paper mentions a project website, but does not explicitly state that the source code for the described methodology is available there. It also refers to third-party tools like 'robomimic' but not their own code. |
| Open Datasets | Yes | We demonstrate the efficacy of STRAP in simulation on the LIBERO benchmark (Liu et al., 2024), in two real-world scenarios following the DROID (Khazatsky et al., 2024) hardware setup. We initialize the Res Net-18 (He et al., 2015) vision encoders of our policy with weights pre-trained on Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | LIBERO: We randomly sample 5 demonstrations from LIBERO-10 as Dtarget and utilize the 4500 trajectories in LIBERO-90 as Dprior. ... DROID-Kitchen: ...We collect 150 multi-task demonstrations across scenes in Dprior (Kitchen), with 50 per scene. ... Dtarget contains three demos of a single task per environment. |
| Hardware Specification | Yes | We benchmark Huggingface s DINOv2 implementation3 on an NVIDIA L40S 46GB using batch size 32. ... Training a single policy takes 35min 4 (average over 10 trials) on an NVIDIA L40S 46GB. |
| Software Dependencies | No | The paper mentions using 'Huggingface s DINOv2 implementation', 'numba', and 'robomimic' but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | All results are reported over 3 training and evaluation seeds (1234, 42, 4325). We fixed both the number of segments retrieved to 100, the camera viewpoint to the agent view image for retrieval, and the number of expert demonstrations to 5. ... Our transformer policy was trained for 300 epochs with batch size 32 and an epoch every 200 gradient steps. |