Latent Diffusion Planning for Imitation Learning

Authors: Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We focus our experiments on 4 image-based imitation learning tasks: (1) Robomimic Lift, (2) Robomimic Can, (3) Robomimic Square, and (4) ALOHA Sim Transfer Cube. Robomimic (Mandlekar et al., 2021) is a robotic manipulation and imitation benchmark, including the tasks Lift, Can, and Square. We evaluate the success rate out of 50 trials, using the best checkpoint from the last 5 saved checkpoints, with 2 seeds. We create a real world implementation of the Robomimic Lift task, where the task is to pick up a red block from a randomly initialized position. To evaluate our policies, we calculate the success rate across 45 evaluation trials. To thoroughly evaluate performance across the initial state space, we evaluate across a grid of 3x3 points, with 5 attempts per point. We evaluate 3 seeds per method. In Table 1, we examine how action-free data can be used to improve imitation learning policies. In Table 2, we present imitation learning results with suboptimal data. In Table 3, we provide results on a Franka Lift Cube task. In Table 4, we find that LDP outperforms LDP Hierarchical across the 3 Robomimic tasks.
Researcher Affiliation Academia Amber Xie 1 Oleh Rybkin 2 Dorsa Sadigh 1 Chelsea Finn 1 1Stanford 2UC Berkeley. Correspondence to: Amber Xie <EMAIL>.
Pseudocode Yes Algorithm 1 Inference with Latent Diffusion Planning 1: Input: Encoder E, Planner ϵψ, IDM ϵξ, Planner Diffusion Timesteps Tp, IDM Diffusion Timesteps TIDM, Planning Horizon Hp, Action Horizon Ha 2: Observe initial state s0 and image x0; k = 0 3: while not done do 4: zk (E(xk), sk)
Open Source Code Yes 1Project Website and Code: https://amberxie88.github.io/ldp/
Open Datasets Yes Robomimic (Mandlekar et al., 2021) is a robotic manipulation and imitation benchmark, including the tasks Lift, Can, and Square. The Transfer Cube task is a simulated bimanual ALOHA task, in which one Viper X 6-Do F arm grabs a block and transfers it to the other arm (Zhao et al., 2023). We use the DROID setup (Khazatsky et al., 2024) and teleoperate via the Oculus Quest 2 headset.
Dataset Splits Yes For Can and Square, we use 100 out of the 200 demonstrations in the Robomimic datasets; for Lift, we use 3 demonstrations out of the 200 total; and for Transfer Cube, we use 25 demonstrations. Our suboptimal data consists of 500 failed trajectories from an undertrained behavior cloning agent. Our action-free data consists of 100 demonstrations for Lift, Can, and Square from the Robomimic dataset, and 25 demonstrations for Cube.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments or training the models. It mentions the 'Frank Panda 7 degree of freedom robot arm, with a wrist-mounted Zed camera' for the real-world task, but this refers to the robotic system itself rather than the computational hardware used for training.
Software Dependencies No The paper mentions software components and frameworks like "Jax reimplementation of the convolutional Diffusion Policy", "Conditional U-Net Architecture", and references
Experiment Setup Yes Table 8. Diffusion Policy Architecture Hyperparameters down dims [256, 512, 1024] n diffusion steps 100 batch size 256 lr 1e-4 n grad steps 500k Table 9. IDM Architecture Hyperparameters n blocks 3 n diffusion steps 100 batch size 256 lr 1e-4 n grad steps 500k Table 10. VAE Architecture Hyperparameters block out channels [128, 256, 256, 256, 256, 256] down block types [Down Encoder Block2D] x6 up block types [Up Decoder Block2D] x6 latent channels 4 Latent Dim (2, 2, 4) Lift KL Beta 1e-5 Can KL Beta 1e-6 Square KL Beta 1e-6 ALOHA Cube KL Beta 1e-7 n grad steps 300k