Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies
Authors: Pingcheng Jian, Easop Lee, Zachary I. Bell, Michael M. Zavlanos, Boyuan Chen
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method in various simulated and realworld manipulation tasks. In the simulation, we evaluate Pe S in five different robotic manipulation tasks, each with seven unique visual configurations. We also evaluate Pe S in four real-world manipulation tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method. |
| Researcher Affiliation | Collaboration | Pingcheng Jian1 Easop Lee1 Zachary Bell2 Michael M. Zavlanos1 Boyuan Chen1 1Duke University 2Air Force Research Laboratory |
| Pseudocode | Yes | The pseudo code of Pe S is presented in Appendix B B. Algorithm 1 Zero-shot Transfer with Perception Stitching |
| Open Source Code | Yes | generalroboticslab.com/Perception Stitching We list all the parameters of the neural network in the Appendix A with our code base. |
| Open Datasets | Yes | Evaluation Tasks We evaluate Pe S in five manipulation tasks from the Robomimic (Mandlekar et al., 2021) benchmark (Fig. 4(a)) |
| Dataset Splits | No | The dataset of each task contains 200 trajectories of the expert demonstrations. Algorithm 1 Zero-shot Transfer with Perception Stitching Collect Dataset 1 with random sampling: Task T in the environment E1 with two visual configurations o E1 1 and o E1 2 . Initialize an empty dataset D1 . for each game i of the task do Random sample the initial state of the task. Execute the Expert policy to collect the expert trajectory τ 1 i of this game. Push τ 1 i into the dataset D1. end for |
| Hardware Specification | Yes | We picked the Stack task and trained the policies on a NVIDIA A6000 GPU with the batch size of 32. |
| Software Dependencies | No | The paper mentions specific components like ResNet-18, MLP, and LSTM, but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or related libraries. |
| Experiment Setup | Yes | Hyperparameter Default Learning Rate 1 10 4 Action Decoder MLP Dims [1024, 1024] GMM Num Modes 5 Image Encoder Res Net-18 Spatial Softmax (num-KP) 64 Image Embedding Layer 256 units Low Dim Obs Embedding Layer 64 units Table 6: MLP-based policy Hyperparameters. Hyperparameter Default Learning Rate 1 10 4 Action Decoder MLP Dims [ ] RNN Hidden Dim 1000 RNN Seq Len 10 GMM Num Modes 5 Image Encoder Res Net-18 Spatial Softmax (num-KP) 64 Image Embedding Layer 256 units Low Dim Obs Embedding Layer 64 units Table 7: RNN-based policy Hyperparameters. |