UniMuMo: Unified Text, Music, and Motion Generation

Authors: Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Uni Mu Mo achieves competitive results on all unidirectional generation benchmarks across music, motion, and text modalities.
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong, 2University of Washington, 3The University of British Columbia 4University of Massachusetts Amherst, 5MIT-IBM Watson AI Lab, 6Cisco Research
Pseudocode No The paper describes the model architecture and pipeline in prose, for example: "Our pipeline consists of three main stages: a music-motion joint tokenizer that encodes music and motion sequences into discrete representations within the same space, a music-motion transformer-decoder model trained on the task of music-motion joint generation, and a music-motion captioner that generates text descriptions from music and motion features." It does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/hanyangclarence/Uni Mu Mo
Open Datasets Yes With the augmented synchronized music-motion data, we can utilize existing music and motion datasets to train our unified generative model... Music4All dataset... AIST++ dataset... Music QA dataset released by (Liu et al. 2023b)... Human ML3D test set.
Dataset Splits No The paper states: "More implementation details about hyperparameter choices, dataset, metrics and training/evaluation setups are in Appendix." While it mentions
Hardware Specification No The paper states: "More implementation details about hyperparameter choices, dataset, metrics and training/evaluation setups are in Appendix." However, no specific hardware details (like GPU or CPU models) are provided in the main text.
Software Dependencies No The paper mentions "Demucs (D efossez 2021; Rouard, Massa, and D efossez 2023)" as a tool used, but does not provide specific version numbers for it or any other software dependencies. It also states: "More implementation details about hyperparameter choices, dataset, metrics and training/evaluation setups are in Appendix."
Experiment Setup Yes Empirically, λ is set to 0.02... Empirically, µ is set to 0.85... More implementation details about hyperparameter choices, dataset, metrics and training/evaluation setups are in Appendix.