Latent Diffusion Planning for Imitation Learning
Authors: Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We focus our experiments on 4 image-based imitation learning tasks: (1) Robomimic Lift, (2) Robomimic Can, (3) Robomimic Square, and (4) ALOHA Sim Transfer Cube. Robomimic (Mandlekar et al., 2021) is a robotic manipulation and imitation benchmark, including the tasks Lift, Can, and Square. We evaluate the success rate out of 50 trials, using the best checkpoint from the last 5 saved checkpoints, with 2 seeds. We create a real world implementation of the Robomimic Lift task, where the task is to pick up a red block from a randomly initialized position. To evaluate our policies, we calculate the success rate across 45 evaluation trials. To thoroughly evaluate performance across the initial state space, we evaluate across a grid of 3x3 points, with 5 attempts per point. We evaluate 3 seeds per method. In Table 1, we examine how action-free data can be used to improve imitation learning policies. In Table 2, we present imitation learning results with suboptimal data. In Table 3, we provide results on a Franka Lift Cube task. In Table 4, we find that LDP outperforms LDP Hierarchical across the 3 Robomimic tasks. |
| Researcher Affiliation | Academia | Amber Xie 1 Oleh Rybkin 2 Dorsa Sadigh 1 Chelsea Finn 1 1Stanford 2UC Berkeley. Correspondence to: Amber Xie <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Inference with Latent Diffusion Planning 1: Input: Encoder E, Planner ϵψ, IDM ϵξ, Planner Diffusion Timesteps Tp, IDM Diffusion Timesteps TIDM, Planning Horizon Hp, Action Horizon Ha 2: Observe initial state s0 and image x0; k = 0 3: while not done do 4: zk (E(xk), sk) |
| Open Source Code | Yes | 1Project Website and Code: https://amberxie88.github.io/ldp/ |
| Open Datasets | Yes | Robomimic (Mandlekar et al., 2021) is a robotic manipulation and imitation benchmark, including the tasks Lift, Can, and Square. The Transfer Cube task is a simulated bimanual ALOHA task, in which one Viper X 6-Do F arm grabs a block and transfers it to the other arm (Zhao et al., 2023). We use the DROID setup (Khazatsky et al., 2024) and teleoperate via the Oculus Quest 2 headset. |
| Dataset Splits | Yes | For Can and Square, we use 100 out of the 200 demonstrations in the Robomimic datasets; for Lift, we use 3 demonstrations out of the 200 total; and for Transfer Cube, we use 25 demonstrations. Our suboptimal data consists of 500 failed trajectories from an undertrained behavior cloning agent. Our action-free data consists of 100 demonstrations for Lift, Can, and Square from the Robomimic dataset, and 25 demonstrations for Cube. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments or training the models. It mentions the 'Frank Panda 7 degree of freedom robot arm, with a wrist-mounted Zed camera' for the real-world task, but this refers to the robotic system itself rather than the computational hardware used for training. |
| Software Dependencies | No | The paper mentions software components and frameworks like "Jax reimplementation of the convolutional Diffusion Policy", "Conditional U-Net Architecture", and references |
| Experiment Setup | Yes | Table 8. Diffusion Policy Architecture Hyperparameters down dims [256, 512, 1024] n diffusion steps 100 batch size 256 lr 1e-4 n grad steps 500k Table 9. IDM Architecture Hyperparameters n blocks 3 n diffusion steps 100 batch size 256 lr 1e-4 n grad steps 500k Table 10. VAE Architecture Hyperparameters block out channels [128, 256, 256, 256, 256, 256] down block types [Down Encoder Block2D] x6 up block types [Up Decoder Block2D] x6 latent channels 4 Latent Dim (2, 2, 4) Lift KL Beta 1e-5 Can KL Beta 1e-6 Square KL Beta 1e-6 ALOHA Cube KL Beta 1e-7 n grad steps 300k |