reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Authors: Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on multiple datasets demonstrate substantial improvements over previous methods across all three tasks. Extensive experiments conducted across various datasets demonstrate the effectiveness of our approach in text-to-motion generation, motion-text retrieval, and motion-to-text captioning, with significant improvements compared to previous state-of-the-art methods.
Researcher Affiliation	Collaboration	1 Huazhong University of Science and Technology 2 Alibaba Group 3 Nanjing University
Pseudocode	No	The paper describes the methods in prose and does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Project Page: https://aigc3d.github.io/LaMP/ The provided link is a project page, which does not explicitly state that the source code for the methodology described in this paper is available, nor does it link directly to a code repository.
Open Datasets	Yes	We evaluate our model on Human ML3D (Guo et al., 2022a) and KIT-ML (Plappert et al., 2016) datasets.
Dataset Splits	Yes	we allocate 23,384 samples for training, 1,460 for validation, and 4,383 for testing within Human ML3D, and utilize 4,888 for training, 300 for validation, and 830 for testing in KIT-ML.
Hardware Specification	Yes	Our model is implemented on NVIDIA A100 GPU using PyTorch.
Software Dependencies	No	Our model is implemented on NVIDIA A100 GPU using PyTorch. This mentions PyTorch but does not specify a version number.
Experiment Setup	Yes	For the motion VQ-VAE, we employ resblocks for both the encoder and decoder, with a downscale factor of 4. The VQ consists of 6 quantization layers, where each layer s codebook contains 512 512-dimensional codes. The quantization dropout ratio p is set to 0.2. The masked transformer is composed of 6 transformer layers with casual attention masks, 6 heads, and a latent dimension of 384. The learning rate reaches 2e-4 after 2000 iterations with a linear warm-up schedule for the training of all models. During inference, we set the CFG scale of mask transformer as 4 on Human ML3D, and 2 on KIT-ML. Meanwhile, K was set to 10 on both datasets.