reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling Large Motion Models with Million-Level Human Motions

Authors: Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Weishuai Zeng, Qin Jin, Zongqing Lu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To address this gap, we present Motion Lib, the first million-level dataset for motion generation, which is at least 15 larger than existing counterparts and enriched with hierarchical text descriptions. Using Motion Lib, we train a large motion model named Being-M0, demonstrating robust performance across a wide range of human activities, including unseen ones. Through systematic investigation, for the first time, we highlight the importance of scaling both data and model size for advancing motion generation, along with key insights to achieve this goal.
Researcher Affiliation	Collaboration	1Renmin University of China 2Beijing Academy of Artificial Intelligence 3Institute of Automation, Chinese Academy of Sciences 4Southeast University 5Peking University 6Being Beyond. Correspondence to: Zongqing Lu <EMAIL>.
Pseudocode	No	The paper describes methods and procedures in narrative text and flowcharts, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	For further details, visit https://beingbeyond. github.io/Being-M0/.
Open Datasets	Yes	In this paper, we aim to address the question: Can scaling the large motion model and data benefit motion generation? To tackle this, we develop a systematic data collection pipeline to build Motion Lib, the first large-scale dataset containing over 1.2M motion sequences at least 15 larger than current counterparts. This initiative provides a solid foundation for building robust, universally applicable motion models and offers a comprehensive testbed for future research.
Dataset Splits	Yes	Following standard practice, each dataset is split into training, validation, and test sets in proportions of 85%, 5%, and 15%, respectively.
Hardware Specification	Yes	For training the large motion model, full parameter tuning is performed on 8 A800 GPUs with a batch size of 1024 over 100 epochs.
Software Dependencies	No	The paper mentions models like GPT2-medium, LLa MA2-7b, LLa MA2-13b, and LLa MA3.1-8b, but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For the motion tokenizer, we implement the VQ codebook C R1024 512 with an embedding dimensionality of d = 512. The resulting discrete codes are incorporated as additional vocabulary for the LLM. As a comparison, the LFQ codebook has a size of 216 = 16384. The motion encoder E uses a temporal downsampling rate of α = 4. We experiment with four large language model (LLM) architectures to construct our large motion model: GPT2-medium (Radford et al., 2019), LLa MA2-7b, LLa MA2-13b (Touvron et al., 2023), and LLa MA3.1-8b (Dubey et al., 2024). The motion tokenizer is trained with a learning rate of 1e-4 and a batch size of 256 for 300K iterations. For training the large motion model, full parameter tuning is performed on 8 A800 GPUs with a batch size of 1024 over 100 epochs. The learning rate is set to 2e-4 for GPT2-medium and 2e-5 for the LLa MA models.