reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, yuelin bai, Yinghao MA, Ziya Zhou, Ka Man Lo, JIAHENG LIU, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xeron Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Wenhao Huang, Jie Fu, Ge Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted both objective and subjective evaluations comparing our Mu PT model with stateof-the-art models like GPT-4 Chat Musician and MMT, focusing on ABC-notation and MIDI-based approaches. Objectively, Mu PT achieved the closest approximation to ground truth, with an average gap of just 0.11, significantly outperforming Chat Musician s 0.48. This seemingly small numerical difference marks a substantial improvement in music generation quality. Notably, Mu PT supports multi-track music generation, a feature absent in Chat Musician, enhancing its utility in realistic settings where such complexity is common. In experiments assessing music structure, Mu PT surpassed GPT-4 by 17% and Chat Musician by 6% in terms of Intra Similarity and Repetition Rate, demonstrating its superior capability in handling complex musical compositions.Subjective evaluations further validated Mu PT s superiority, with over 70% preference ratings against both MMT and GPT-4, underscoring its appeal to human listeners.
Researcher Affiliation	Collaboration	1M-A-P, 2University of Waterloo, 3HKUST, 4University of Manchester, 5Shenzhen Institute of Advanced Technology, CAS, 6Vector Institue, 7QMUL, 8MBZUAI, 9Mila, 10Institute of Automation, CAS, 11Central Conservatory of Music, 12Institute for AI, 13SJTU, 14MSRA, 15 University of Montreal, 16CCRMA, Stanford University,17 NJU,18Nanyang Technological University, 19Byte Dance
Pseudocode	No	The paper describes methods and models, including mathematical equations and architectural descriptions, but does not present any explicitly labeled pseudocode or algorithm blocks with structured, code-like procedures.
Open Source Code	Yes	Open Source. We release a suite of state-of-the-art long-context symbolic music foundation models along with all the intermediate training checkpoints to foster community research and innovation in symbolic music modeling.
Open Datasets	Yes	The training set is built from a comprehensive collection, incorporating the Nottingham Music Dataset1, the ABC tune book of Henrik Norbeck(Ji et al., 2020), the Irishman dataset (Wu et al., 2023), and a private dataset owned by the Central Conservatory of Music (including university library corpus in ABC and other formats that can be converted to ABC like Music XML, along with internet collections). 1https://ifdo.ca/seymour/nottingham/nottingham.html
Dataset Splits	Yes	The dataset used in our empirical study is divided into two parts: a testing set and a training set. The testing set is derived from WIKIMT++(Zhou et al., 2023), which includes 1,010 ABC notation scores from eight music genres (e.g., Pop, Jazz, Rock, R&B, Latin, etc.) along with 12 subjective emotions. Additionally, the test set comprises 207 multi-track classical music pieces manually selected from Bach s compositions. Importantly, none of these pieces overlap with the training set, ensuring that the test set can effectively evaluate the model s performance across diverse musical genres, various emotions and in generating out-of-domain music.
Hardware Specification	No	Table 10: Training Details for different ABC format and model settings. Parameters Context Length Trained Tokens Training Days Num of GPUs. This table lists the 'Num of GPUs' used (e.g., 8, 32, 64) but does not specify the type or model of these GPUs (e.g., NVIDIA A100, Tesla V100), nor any CPU or memory details.
Software Dependencies	No	Mu PT utilizes a standard Transformer model architecture (Vaswani et al., 2023) in a decoder-only setup... Swi GLU activation... RMSNorm... Ro PE Embeddings... We chose You Token To Me (YTTM) (You Token To Me, 2021) framework... All the models are trained using Adam Kingma & Ba (2014)... While these tools and frameworks are mentioned, specific version numbers for software libraries like PyTorch, TensorFlow, CUDA, or specific versions for Adam, SwiGLU, RMSNorm implementations are not provided.
Experiment Setup	Yes	All the models are trained using Adam Kingma & Ba (2014), with β1 = 0.9, β2 = 0.95, eps = 10 8. We use a cosine learning rate schedule, decay the final learning rate from 3 5 to 3 6, with warmup ratio of 0.1. We apply a weight decay of 0.1 and gradient clipping of 1.0.