Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
Authors: Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of FB-CPR in a challenging humanoid control problem. Training FB-CPR online with observation-only motion capture datasets, we obtain the first humanoid behavioral foundation model that can be prompted to solve a variety of whole-body tasks, including motion tracking, goal reaching, and reward optimization. The resulting model is capable of expressing human-like behaviors and it achieves competitive performance with task-specific methods while outperforming state-of-the-art unsupervised RL and model-based baselines.1 |
| Researcher Affiliation | Collaboration | 1 Fundamental AI Research at Meta, 2 Mila, Mc Gill University, 3 UCL EMAIL |
| Pseudocode | Yes | In Alg. 1 we provide a detailed pseudo-code of FB-CPR including how all losses are computed. Following Touati et al. (2023), we add two regularization losses to improve FB training: an orthonormality loss pushing the covariance ΣB = E[B(s)B(s) ] of B towards the identity, and a temporal difference loss pushing F(s, a, z) z toward the action-value function of the corresponding reward B(s) Σ 1 B z. The former is helpful to make sure that B is well-conditioned and does not collapse, while the latter makes F spend more capacity on the directions in z space that matter for policy optimization. |
| Open Source Code | Yes | 1Code, models, and an interactive demo are available at https://metamotivo.metademolab.com. |
| Open Datasets | Yes | we use the AMASS dataset (Mahmood et al., 2019), a large collection of uncurated motion capture data, for regularization. |
| Dataset Splits | Yes | After a 10% train-test split, we obtained a train dataset M of 8902 motions and a test dataset MTEST of 990 motions, with a total duration of approximately 29 hours and 3 hours, respectively (see Tab. 2 in App. C.2). |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or types) are provided in the paper for the experimental setup. |
| Software Dependencies | No | The paper mentions software like Mu Jo Co and dm_control, and uses algorithms such as TD3 and Adam optimizer, but does not provide specific version numbers for these software dependencies. For example, 'The simulation is performed using Mu Jo Co (Todorov et al., 2012) at 450 Hz, while the control frequency is 30 Hz.' and 'Unless otherwise stated we use the Adam optimizer (Kingma & Ba, 2015)'. |
| Experiment Setup | Yes | We use a replay buffer of capacity 5M transitions and update agents by sampling mini-batches of 1024 transitions. During online training, we interleave a rollout phase, where we collect 500 transitions across 50 parallel environments, with a model update phase, where we update each network 50 times. The paper also includes Table 3 'Summary of general training parameters,' which specifies 'Number of environment steps 30M' and 'Discount factor 0.98,' and Table 9 'Hyperparameters used for FB-CPR pretraining,' detailing parameters like 'z dimension d 256' and 'Learning rate for F 10-4'. |