Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Authors: Pingcheng Jian, Easop Lee, Zachary I. Bell, Michael M. Zavlanos, Boyuan Chen

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in various simulated and realworld manipulation tasks. In the simulation, we evaluate Pe S in five different robotic manipulation tasks, each with seven unique visual configurations. We also evaluate Pe S in four real-world manipulation tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.
Researcher Affiliation Collaboration Pingcheng Jian1 Easop Lee1 Zachary Bell2 Michael M. Zavlanos1 Boyuan Chen1 1Duke University 2Air Force Research Laboratory
Pseudocode Yes The pseudo code of Pe S is presented in Appendix B B. Algorithm 1 Zero-shot Transfer with Perception Stitching
Open Source Code Yes generalroboticslab.com/Perception Stitching We list all the parameters of the neural network in the Appendix A with our code base.
Open Datasets Yes Evaluation Tasks We evaluate Pe S in five manipulation tasks from the Robomimic (Mandlekar et al., 2021) benchmark (Fig. 4(a))
Dataset Splits No The dataset of each task contains 200 trajectories of the expert demonstrations. Algorithm 1 Zero-shot Transfer with Perception Stitching Collect Dataset 1 with random sampling: Task T in the environment E1 with two visual configurations o E1 1 and o E1 2 . Initialize an empty dataset D1 . for each game i of the task do Random sample the initial state of the task. Execute the Expert policy to collect the expert trajectory τ 1 i of this game. Push τ 1 i into the dataset D1. end for
Hardware Specification Yes We picked the Stack task and trained the policies on a NVIDIA A6000 GPU with the batch size of 32.
Software Dependencies No The paper mentions specific components like ResNet-18, MLP, and LSTM, but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or related libraries.
Experiment Setup Yes Hyperparameter Default Learning Rate 1 10 4 Action Decoder MLP Dims [1024, 1024] GMM Num Modes 5 Image Encoder Res Net-18 Spatial Softmax (num-KP) 64 Image Embedding Layer 256 units Low Dim Obs Embedding Layer 64 units Table 6: MLP-based policy Hyperparameters. Hyperparameter Default Learning Rate 1 10 4 Action Decoder MLP Dims [ ] RNN Hidden Dim 1000 RNN Seq Len 10 GMM Num Modes 5 Image Encoder Res Net-18 Spatial Softmax (num-KP) 64 Image Embedding Layer 256 units Low Dim Obs Embedding Layer 64 units Table 7: RNN-based policy Hyperparameters.