Learning View-invariant World Models for Visual Robotic Manipulation
Authors: Jing-Cheng Pang, Nan Tang, Kaiyuan Li, Yuting Tang, Xin-Qiang Cai, Zhen-Yu Zhang, Gang Niu, Masashi Sugiyama, Yang Yu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of Re Vi Wo in various viewpoint disturbance scenarios, including control under novel camera positions and frequent camera shaking, using the Meta-world & Panda Gym environments. Besides, we also conduct experiments on real world ALOHA robot. The results demonstrate that Re Vi Wo maintains robust performance under viewpoint disturbance, while baseline methods suffer from significant performance degradation. Furthermore, we show that the VIR captures taskrelevant state information and remains stable for observations from novel viewpoints, validating the efficacy of the Re Vi Wo approach. |
| Researcher Affiliation | Collaboration | 1 National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China; 2 RIKEN Center for Advanced Intelligence Project, Japan; 3 Polixir.ai, China; 4 The University of Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 Representation learning for View-invariant World model (Re Vi Wo) |
| Open Source Code | No | The paper mentions using "Offline RL-kit (Sun, 2023)" but does not provide a direct link or explicit statement that *their* methodology's code is open-source or available. |
| Open Datasets | Yes | Meanwhile, Re Vi Wo is simutaneously trained on Open X-Embodiment datasets without view labels. We conduct experiments on two robotics manipulation environments: Meta-world (Yu et al., 2019) and Panda Gym (Gallou edec et al., 2021). Integration of Open X-Embodiment data without view labels. In addition to the data with view labels, we also involve multi-view data without view labels from the Open X-Embodiment dataset (O Neill et al., 2024), which are readily available on the internet. |
| Dataset Splits | No | The paper describes data collection processes for training the autoencoder and offline control data, as well as evaluation scenarios (e.g., various azimuth offsets, camera shaking). However, it does not provide explicit training/validation/test splits (e.g., percentages or absolute counts for reproduction) from a single dataset, but rather describes training on collected data and evaluating on different disturbance conditions. |
| Hardware Specification | Yes | We use 64 CPU cores (AMD EPYC 9654 @ 2.4GHz) and 4 GPUs (NVIDIA Ge Force RTX 4090) for our experiments. |
| Software Dependencies | Yes | The software stack employed for our experiments includes Python 3.11 and Py Torch 2.1.0. |
| Experiment Setup | Yes | The hyper-parameters for implementing Re Vi Wo are presented in Table 4. For all methods, the model is trained with an offline RL algorithm for 25000 gradient steps, and evaluated for 40 episodes. |