MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer

Authors: Yilin Wang, chuan guo, Yuxuan Mu, Muhammad Gohar Javed, Xinxin Zuo, Juwei Lu, Hai Jiang, Li Cheng

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENT AND RESULTS 4.1 IMPLEMENTATION DETAILS Dataset We collect 30 motion sequences from Mixamo (Mixamo, 2023) with human-like skeleton and 30 motion sequences with skeletons of animal and artist-crafted creatures from Truebone ZOO (Studio, 2023) dataset to form the Sin Motion dataset used for evaluation. Evaluation Metrics We apply 5 sets of metrics to measure the expressiveness of local motion patterns and synthesis diversity respectively following Li et al. (2022a) and Raab et al. (2024): 4.2 COMPARISON WITH STATE-OF-THE-ART METHODS Quantitative Comparison For each reference motion in the Sin Motion dataset, we randomly generate 20 samples with Lg = L for measuring the metrics. Table 1 presents the quantitative results of our method compared to the state-of-the-art methods. 4.3 ABLATION: CODEBOOK DISTRIBUTION REGULARIZATION We run the training of single motion tokenization without Ltoken and compare the results with our method quantitatively in Table 2.
Researcher Affiliation Collaboration Yilin Wang1, Chuan Guo1, Yuxuan Mu3, Muhammad Gohar Javed1, Xinxin Zuo2, Juwei Lu4, Hai Jiang1, Li Cheng1 1University of Alberta 2Concordia University 3Simon Fraser University 4Noah s Ark Lab, Huawei Canada
Pseudocode No The paper describes the proposed method and architecture in detail, including mathematical formulations for attention mechanisms and loss functions. However, it does not present these steps or procedures in a structured pseudocode or algorithm block format.
Open Source Code Yes Our project page includes visualization demos and implementation codes. Visit our project page: https://motiondreamer.github.io/.
Open Datasets Yes We collect 30 motion sequences from Mixamo (Mixamo, 2023) with human-like skeleton and 30 motion sequences with skeletons of animal and artist-crafted creatures from Truebone ZOO (Studio, 2023) dataset to form the Sin Motion dataset used for evaluation.
Dataset Splits No The paper describes the collection of the "Sin Motion dataset" for evaluation, consisting of 30 long and 30 short sequences. For quantitative comparison, it states: "For each reference motion in the Sin Motion dataset, we randomly generate 20 samples with Lg = L for measuring the metrics." This describes the evaluation process. However, the core method learns from a single reference motion, and explicit training/validation/test splits of a dataset for model training are not provided.
Hardware Specification Yes All the reference motions are trained and evaluated on a single RTX2080Ti GPU.
Software Dependencies Yes The beat-aligned dance synthesis involves incorporating auxiliary beat features using librosa (Mc Fee et al., 2015) as an auxiliary temporal-aligned feature for the motion tokens by employing an additional pair of light-weight encoder-decoder.
Experiment Setup Yes Table 4: Parameter settings for architectures. Table 5: Parameter settings for training and inference. During training, we crop the single reference motion sequence into overlapping motion patches of length Tp and sp. At the single motion tokenization phase, encoder E, codebook C and decoder D are trained with learning rate lr1. Local-M transformer is trained with learning rate lr2. We select the parameter settings for training in Table 5 based on the best empirical results.