EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

Authors: Carl Qi, Dan Haramati, Tal Daniel, Aviv Tamar, Amy Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To study the above, we evaluate our method on 7 goal-conditioned multi-object manipulation tasks across 3 simulated environments and compare against several competitive BC baselines learning from various image representations.
Researcher Affiliation Collaboration 1 UT Austin, 2 Technion, Israel Institute of Technology, 3 Brown University, 4 Meta AI
Pseudocode No The paper describes the architecture in text and with Figure 1, but does not include any explicit pseudocode blocks labeled "Pseudocode" or "Algorithm", nor does it present structured steps formatted like code.
Open Source Code Yes We provide the project code to reproduce the experiments at https://github.com/carl-qi/EC-Diffuser.
Open Datasets Yes The datasets used are publicly available, and the work aims to improve the efficiency of robotic object manipulation tasks.
Dataset Splits Yes We subsample 3000 episodes from the real robot dataset, each padded to 45 images, and we randomly select 2700 episodes for training and 300 for validation.
Hardware Specification Yes For GPUs, we use both NVIDIA RTX A5500 (20GB) and NVIDIA A40 (40GB), though our model training requires only around 8GB of memory.
Software Dependencies No The paper mentions using specific codebases like "PINT modules from DDLP Daniel & Tamar (2024)", "Diffuser Janner et al. (2022) codebase", "VQ-Be T (Lee et al., 2024)" and "ECRL (Haramati et al., 2024)". However, it does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Table 5: Hyper-parameters used for the EC-Diffuser model. Batch size 32 Learning rate 8e-5 Diffusion steps 5, 100 (generalization tasks) Horizon 3 Number of heads 8 Number of layers 6, 12 (generalization tasks) Hidden dimensions 256, 512 (generalization tasks)