Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Authors: Nick Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the efficacy of our approach, we propose a new task suite for visual whole-body humanoid control with a simulated 56-Do F humanoid, which contains a total of 8 challenging tasks. We show that our method produces highly performant control policies across all tasks compared to a set of strong model-free and model-based baselines: SAC (Haarnoja et al., 2018), Dreamer V3 (Hafner et al., 2023), and TD-MPC2 (Hansen et al., 2024). Furthermore, we find that motions generated by our method are broadly preferred by humans in a user study with 51 participants. We conclude the paper by carefully dissecting how each of our design choices influence results. Code for method and environments is available at https://www.nicklashansen. com/rlpuppeteer.
Researcher Affiliation Collaboration Nicklas Hansen1 Jyothir S V2 Vlad Sobal2 Yann Le Cun2,3 Xiaolong Wang1 Hao Su1 ... 1UC San Diego 2New York University 3Meta AI
Pseudocode No The paper describes the components of the world model and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and videos: https://www.nicklashansen.com/rlpuppeteer
Open Datasets Yes We leverage pre-existing human Mo Cap data (CMU, 2003) retargeted to the 56-Do F CMU Humanoid embodiment (Tassa et al., 2018) during training of the tracking model... We use the small offline dataset provided by Mo Cap Act (Wagener et al., 2022), which is available at https://microsoft.github.io/Mo Cap Act.
Dataset Splits No The paper describes training with a mixture of offline data and online interactions, sampling 50% of each batch from the offline dataset and 50% from the online replay buffer for each gradient update. It evaluates tracking quality over 'all clips in the dataset' and uses a task suite for online learning. However, it does not provide specific training/validation/test splits of the Mo Cap dataset itself, nor for the data generated during online interaction in the downstream tasks in a way that allows reproduction of data partitioning.
Hardware Specification Yes Training the tracking world model takes approximately 12 days, and training the puppeteering world model takes approximately 4 days, both on a single NVIDIA Ge Force RTX 3090 GPU. CPU and RAM usage is negligible. System requirements are detailed in Appendix C. Table 3. System requirements. Training wall-time, inference time, and GPU memory requirements for Puppeteer, TD-MPC2, and SAC on a single NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies No We rely on DMControl and Mu Jo Co for simulation which are publicly available and licensed under the Apache 2.0 license. We base our implementation off of TD-MPC2 and use default design choices and hyperparameters whenever possible. We use the official implementation available at https://github.com/ nicklashansen/tdmpc2.
Experiment Setup Yes Implementation. We pretrain a single 5M parameter TD-MPC2 world model to track all 836 CMU Mo Cap (CMU, 2003) reference motions retargeted to the CMU Humanoid model... All hyperparameters are listed in Table 5. Table 5. List of hyperparameters. We use the same hyperparameters across all tasks, levels (highlevel and low-level), and across both Puppeteer and TD-MPC2 when applicable. Hyperparameters unique to Puppeteer are highlighted.