reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects

Authors: Wonkwang Lee, Jongwon Jeong, Taehong Moon, Hyeon-Jong Kim, Jaehyeon Kim, Gunhee Kim, Byeong-Uk Lee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method learns to generate high-fidelity motions from textual descriptions for diverse and even unseen objects, setting a strong foundation for motion synthesis across diverse object categories and skeletal templates. Qualitative results are available on this link. ... Extensive experiments on Truebones Zoo dataset demonstrate our framework s ability to generate high-fidelity motions conditioned on textual descriptions, or even synthesize motions for novel objects downloaded from the web.
Researcher Affiliation	Collaboration	1Seoul National University 2KRAFTON 3NVIDIA. Correspondence to: Byeong-Uk Lee <EMAIL>.
Pseudocode	No	The paper describes methods and processes in text and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	To inspire future work and further advancements, we will release the code for our data and model pipelines, along with the annotated captions, establishing a comprehensive benchmark for motion synthesis across diverse objects with heterogeneous skeletal structures.
Open Datasets	Yes	To address the first challenge, we utilize the Truebones Zoo dataset (Truebones, 2022), which contains over 1,000 artist-created animated armature meshes in FBX format, as illustrated in Figure 1.
Dataset Splits	Yes	To evaluate the pose synthesis model, we aggregate all motion data for each object category and extract their poses. We then apply clustering, generating 30 distinct pose clusters per object. Three clusters are randomly selected as the test pose set, while the remaining clusters are used for training. For motion synthesis evaluation, we randomly select one motion per object for the test set, with the remaining motions used for training.
Hardware Specification	Yes	All models were trained on a Linux system equipped with either an NVIDIA RTX A6000 (48GB) or A100 (40GB) GPU.
Software Dependencies	No	The paper mentions using GPT-4o and Sig LIP-SO400M-patch14-384 (Zhai et al., 2023) but does not provide specific version numbers for these or other software libraries/environments.
Experiment Setup	Yes	The pose diffusion model required approximately 29GB of VRAM with a batch size of 512 over 400K iterations, completing training in roughly 30 hours. The motion diffusion model used about 38GB of VRAM with a batch size of 4 and sequence length of 90, trained for 1M iterations over approximately 4 days. ... For motions containing more than 90 frames, we randomly sample a chunk of frames during training.