Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
Authors: Kaizhe Hu, Zihang Rui, Yao He, Yuyao Liu, Pu Hua, Huazhe Xu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical study demonstrates the effectiveness of our approach in a variety of simulated and real-world tasks and a range of different perturbations. Stem-OB proves to be particularly effective in real-world tasks where appearance and lighting changes hamper the other baselines, establishing an overall improvement in the success rate of 22.2%. |
| Researcher Affiliation | Academia | Kaizhe Hu123 Zihang Rui1 Yao He4 Yuyao Liu1 Pu Hua123 Huazhe Xu123 1 Tsinghua University 2 Shanghai Qi Zhi Institute 3 Shanghai AI Lab 4 Stanford University EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods in prose and equations, such as in Section 4 and Section 5.3, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Reproducibility: The main algorithm of our method is simple as applying the open-sourced DDPM inversion method on the dataset before training. We’ve provided the code for our method in the supplementary material. |
| Open Datasets | Yes | Our simulation experiments consider different tasks within two frameworks: a photorealistic simulation platform SAPIEN 3 (Xiang et al., 2020) and a less realistic framework Mimic Gen (Mandlekar et al., 2023). We leverage the Mani Skill 3 dataset (Gu et al., 2023; Tao et al., 2024), collected on SAPIEN 3, for benchmarking. |
| Dataset Splits | Yes | The object locations in training set are randomly initialized within a specified area, and 100 demonstrations are collected per task. For testing, nine predefined target positions are used. [...] 50 episodes are tested for each setting. [...] For evaluation, we employ a single image as the input to the policy, using 500 samples out of a total of 1000 demos for training. [...] 300 episodes are tested for each setting of all the tasks. |
| Hardware Specification | No | The paper does not provide specific details about the computing hardware (e.g., GPU models, CPU models, memory) used for training or inference, only mentioning the robot arm and cameras for real-world experiments. |
| Software Dependencies | No | The paper mentions using Diffusion Policy (DP) and Stable Diffusion models but does not provide specific version numbers for software dependencies like PyTorch, TensorFlow, CUDA, or other libraries. |
| Experiment Setup | Yes | The hyperparameters for Diffusion Policy across all experiments are listed in Tab. 6. We use the same hyperparameters of diffusion policy for all the experiments. Table 6 provides specific values such as 'batch size 128', 'num epochs 1500', 'learning rate initial 0.0001', and architecture details like 'unet down dims [256, 512,1024]'. |