reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

Authors: Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of FB-CPR in a challenging humanoid control problem. Training FB-CPR online with observation-only motion capture datasets, we obtain the first humanoid behavioral foundation model that can be prompted to solve a variety of whole-body tasks, including motion tracking, goal reaching, and reward optimization. The resulting model is capable of expressing human-like behaviors and it achieves competitive performance with task-specific methods while outperforming state-of-the-art unsupervised RL and model-based baselines.1
Researcher Affiliation	Collaboration	1 Fundamental AI Research at Meta, 2 Mila, Mc Gill University, 3 UCL EMAIL
Pseudocode	Yes	In Alg. 1 we provide a detailed pseudo-code of FB-CPR including how all losses are computed. Following Touati et al. (2023), we add two regularization losses to improve FB training: an orthonormality loss pushing the covariance ΣB = E[B(s)B(s) ] of B towards the identity, and a temporal difference loss pushing F(s, a, z) z toward the action-value function of the corresponding reward B(s) Σ 1 B z. The former is helpful to make sure that B is well-conditioned and does not collapse, while the latter makes F spend more capacity on the directions in z space that matter for policy optimization.
Open Source Code	Yes	1Code, models, and an interactive demo are available at https://metamotivo.metademolab.com.
Open Datasets	Yes	we use the AMASS dataset (Mahmood et al., 2019), a large collection of uncurated motion capture data, for regularization.
Dataset Splits	Yes	After a 10% train-test split, we obtained a train dataset M of 8902 motions and a test dataset MTEST of 990 motions, with a total duration of approximately 29 hours and 3 hours, respectively (see Tab. 2 in App. C.2).
Hardware Specification	No	No specific hardware details (like GPU/CPU models or types) are provided in the paper for the experimental setup.
Software Dependencies	No	The paper mentions software like Mu Jo Co and dm_control, and uses algorithms such as TD3 and Adam optimizer, but does not provide specific version numbers for these software dependencies. For example, 'The simulation is performed using Mu Jo Co (Todorov et al., 2012) at 450 Hz, while the control frequency is 30 Hz.' and 'Unless otherwise stated we use the Adam optimizer (Kingma & Ba, 2015)'.
Experiment Setup	Yes	We use a replay buffer of capacity 5M transitions and update agents by sampling mini-batches of 1024 transitions. During online training, we interleave a rollout phase, where we collect 500 transitions across 50 parallel environments, with a model update phase, where we update each network 50 times. The paper also includes Table 3 'Summary of general training parameters,' which specifies 'Number of environment steps 30M' and 'Discount factor 0.98,' and Table 9 'Hyperparameters used for FB-CPR pretraining,' detailing parameters like 'z dimension d 256' and 'Learning rate for F 10-4'.