reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization

Authors: guangtao lyu, Chenghao Xu, Jiexi Yan, Muli Yang, Cheng Deng

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive analyses and extensive experiments across multiple public datasets demonstrate that our model achieves state-of-the-art performance across various tasks and scenarios.
Researcher Affiliation	Academia	1 School of Electronic Engineering, Xidian University, Xi an, Shaanxi, China, 2 School of Computer Science and Technology, Xidian University, Xi an, Shaanxi, China, 3 Institute for Infocomm Research (I2R), A*STAR, Singapore
Pseudocode	No	The paper describes methods using equations and natural language, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories.
Open Datasets	Yes	To validate the effectiveness of our sparse lexical representations, we conduct experiments on two commonly used public datasets: the Human ML3D (Guo et al., 2022a) dataset and the KIT Motion-Language dataset (Plappert et al., 2016).
Dataset Splits	Yes	The Human ML3D dataset extends the AMASS (Mahmood et al., 2019) and Human Act12 (Guo et al., 2020) motion capture datasets by adding natural language annotations, comprising 23,384 motions for training, 1,460 for validation, and 4,380 for testing. The KIT-ML dataset is focused primarily on locomotion, derived from motion capture data, 4,888 motions for training, 300 for validation, and 830 for testing.
Hardware Specification	Yes	We compare the training time of different models on the Human ML3D dataset using a single A6000 GPU and report the results in Table 12.
Software Dependencies	No	The paper mentions using pretrained BERT (Devlin, 2018) as a text encoder and a transformer (Vaswani et al., 2017) for the motion encoder, along with the Adam optimizer (Kingma & Ba, 2014). However, it does not specify concrete version numbers for any software libraries or programming languages used (e.g., PyTorch version, Python version).
Experiment Setup	Yes	We utilize pretrained BERT (Devlin, 2018) as our text encoder and implement a transformer (Vaswani et al., 2017) with spatial and temporal attention mechanisms for the motion encoder. Our experiments employ the Adam optimizer (Kingma & Ba, 2014), with learning rates set to 10-5 for the text encoder, 10-4 for the motion encoder, and 10-3 for the Lexical Disentanglement Head and Lexical Bottleneck Masked Decoder. During the Lex MLM phase, we train with a batch size of 128 for 50 epochs. In the CMMM phase, we use a batch size of 64 and train for 200 epochs. For the Lex MMM phase, we freeze the lexical space and fine-tune the motion encoder to align with the language domain, using a batch size of 64 for 150 epochs. Finally, in the Lex CMLP phase, we use a batch size of 64 and train for 20 epochs at a learning rate of 10-5.