ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Authors: Arth Shukla, Stone Tao, Hao Su

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train extensive reinforcement learning (RL) and imitation learning (IL) baselines for future work to compare against. Leveraging our fast environments, we run extensive RL baselines, training 150 policies across 3 seeds (50 policies/seed) with 1.83 billion environment samples.
Researcher Affiliation Collaboration Arth Shukla, Stone Tao & Hao Su Hillbot Inc. and University of California, San Diego EMAIL
Pseudocode No The paper describes methods and processes textually and through mathematical formulations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, figures, or sections.
Open Source Code Yes Videos, models, data, code, and more at http://arth-shukla.github.io/mshab
Open Datasets Yes The Home Assistant Benchmark (HAB) (Szot et al., 2021) includes three long-horizon tasks which involve rearranging objects from the YCB dataset (C alli et al., 2015): The Replica CAD dataset (Szot et al., 2021) serves as the source for our apartment scenes.
Dataset Splits Yes The dataset is split into three parts: 3 macro-variations for training, 1 for validation, and 1 for testing. However, as the test split is not publicly accessible, our study utilizes only the train and validation splits. Furthermore, for each long-horizon task, HAB provides 10,000 training episode configurations and 1,000 validation configurations.
Hardware Specification Yes Our benchmarking is conducted on a machine equipped with a 16-core/32-thread Intel i9-12900KS processor and an Nvidia RTX 4090 GPU with 24 GB VRAM.
Software Dependencies No The paper mentions several algorithms and frameworks like SAC, PPO, D4PG, Nature CNN, and Mani Skill3, but it does not specify any version numbers for these software components or libraries.
Experiment Setup Yes We stack 3 consecutive frames for image observations to handle partial observability. We train Pick and Place using SAC (Haarnoja et al., 2018; Xing, 2022) with a 1m replay buffer size. Visual observations are encoded by D4PG s 4-layer CNN (Barth-Maron et al., 2018) and concatenated with state observations. Actor and critic networks are 3-layer MLPs and the critic has Layer Norm to avoid value divergence (Ball et al., 2023). We train Pick with 50M timesteps and Place with 25M timesteps. We train Open and Close using PPO (Schulman et al., 2017; Huang et al., 2022). Visual observations are encoded by a Nature CNN (Mnih et al., 2015) and concatenated with state observations. The actor and critic networks are 2-layer MLPs. We train Open Fridge with 15M timesteps, Open Drawer with 50M timesteps, Close Fridge with 25M timesteps, and Close Drawer with 15M timesteps. We train 3 seeds for each task/subtask/object combination, evaluating on 189 episodes every 100,000 train samples. We select the checkpoint with highest evaluation success once rate as our final policy.