Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Authors: Vitalis Vosylius, Edward Johns

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks.
Researcher Affiliation Academia Vitalis Vosylius and Edward Johns The Robot Learning Lab at Imperial College London EMAIL
Pseudocode No The paper describes the methodology using textual explanations, mathematical equations (e.g., Equation 1, 2, 3, 4, 5, 6), and diagrams (Figure 1, 2, 3, 4, 9, 10). However, there are no explicitly labeled pseudocode or algorithm blocks presenting structured steps.
Open Source Code Yes Code and videos are available at https://www.robot-learning.uk/instant-policy.
Open Datasets Yes Firstly, to ensure generalisation across object geometries, we populate a simulated environment using a diverse range of objects from the Shape Net dataset (Chang et al., 2015). We use a standard RLBench setup using the Franka Emika Panda and test Instant Policy (IP) and the baselines on 24 tasks
Dataset Splits Yes Additionally, we test models trained using only pseudo-demonstrations (PD only) and a combination of pseudo-demonstrations and 20 demonstrations for each of 12 out of the 24 RLBench tasks (PD++). ... For the design choices and inference parameters, we calculate the average change in success rate on 24 unseen RLBench tasks, with respect to the base model used in the previous set of experiments, while for the scaling trends, we report validation loss on a hold-out set of pseudo-demonstrations to see how well it can capture the underlying data distribution.
Hardware Specification Yes We trained our model for 2.5M optimisation steps using pseudo-demonstrations that are continuously generated in parallel, which is roughly equivalent to using 700K unique trajectories. During training, we randomise the number of demos in context between 1 and 5. When we discuss integrating additional training data beyond pseudo-demonstrations, we refer to models fine-tuned for an additional 100K optimisation steps using a 50/50 mix of pseudo-demonstrations and new data. ...approx. 5 days on a single NVIDIA GeForce RTX 3080-ti. We use a Sawyer robot with a Robotiq 2F-85 gripper and two external Real Sense D415 depth cameras.
Software Dependencies No The paper mentions several software components and tools such as "Adam W (Loshchilov, 2017) optimiser", "torch compile capabilities (Paszke et al., 2019)", "XMem++ (Bekuzarov et al., 2023)", "SAM (Kirillov et al., 2023)", "Mediapipe (Lugaresi et al., 2019)", "Py Render (Matl, 2019)", and "Sentence-BERT (Reimers, 2019)". However, it does not provide specific version numbers for these or other general software dependencies like Python or PyTorch.
Experiment Setup Yes In all our experiments, unless explicitly stated otherwise, we use a single model to perform various manipulation tasks by providing N=2 demos, expressed as L=10 waypoints as context and predict T=8 future actions. We train this model for 2.5M optimisation steps using pseudo-demonstrations that are continuously generated in parallel... We trained our model using Adam W (Loshchilov, 2017) optimiser with a 1e 5 learning rate for 2.5M optimisation steps (approx. 5 days on a single NVIDIA GeForce RTX 3080-ti) followed by a 50K steps learning rate cool-down period.