reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Authors: Philippe Pasquier, Jeff Ens, Nathan Fradet, Paul Triana, Davide Rizzotti, Jean-Baptiste Rolland, Maryam Safi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore the integration and evaluation of MIDI-GPT into commercial products, as well as several artistic works produced using it.
Researcher Affiliation	Collaboration	Philippe Pasquier1, Jeff Ens1, Nathan Fradet1, Paul Triana1, Davide Rizzotti1, Jean-Baptiste Rolland2, Maryam Safi2 1Metacreation Lab Simon Fraser University, Vancouver, Canada 2Steinberg Media Technologies Gmb H, Hamburg, Germany EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the main text. Figures 1 and 2 illustrate tokenization schemes but are not pseudocode.
Open Source Code	Yes	MIDI-GPT has been released1 and is seeing real-world usage in several contexts, which directly supports our assertion that MIDI-GPT is a practical model for computer-assisted composition. 1https://www.metacreation.net/projects/mmm links to models and various examples of generations. We present MIDI-GPT, a style-agnostic generative system released as an Open RAIL-M licenced MMM model (Ens and Pasquier 2020).
Open Datasets	Yes	We use the new Giga MIDI (Lee et al. 2024) dataset, which builds on the Meta MIDI dataset (Ens and Pasquier 2021), to train with a split of: ptrain = 80%, pvalid = 10%, and ptest = 10%.
Dataset Splits	Yes	We use the new Giga MIDI (Lee et al. 2024) dataset, which builds on the Meta MIDI dataset (Ens and Pasquier 2021), to train with a split of: ptrain = 80%, pvalid = 10%, and ptest = 10%.
Hardware Specification	Yes	Training to convergence typically takes 2-3 days using 4 V100 GPUs.
Software Dependencies	No	Our model is built on the GPT2 architecture (Radford et al. 2019), implemented using the Hugging Face Transformers library (Wolf et al. 2020). This tokenization is implemented in Midi Tok (Fradet et al. 2021) for ease of use. No specific version numbers are provided for these libraries.
Experiment Setup	Yes	The configuration of this model includes 8 attention heads and 6 layers, utilizing an embedding size of 512 and an attention window encompassing 2048 tokens. This results in approximately 20 million parameters. For each batch, we pick 32 random MIDI files (batch size)... We train with the Adam optimizer, a learning rate of 10 4, without dropout.