reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Text2midi: Generating Symbolic Music from Captions

Authors: Keshav Bhandari, Abhinaba Roy, Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive empirical evaluations, incorporating both automated and human studies, that show our model generates MIDI files of high quality that are indeed controllable by text captions that may include music theory terms such as chords, keys, and tempo.
Researcher Affiliation	Academia	1Queen Mary University of London 2Singapore University of Technology and Design EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the mathematical formulation and architecture of the model with figures, but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We release the code and music samples on our demo page for users to interact with text2midi. Code https://github.com/AMAAI-Lab/Text2midi
Open Datasets	Yes	Midi Caps is a dataset of 168,401 unique MIDI files with text captions (Melechovsky, Roy, and Herremans 2024). The MIDI files were originally provided in the Lakh MIDI dataset (Raffel 2016), released under the CC-BY 4.0 license. ... Symphony Net (Liu et al. 2022) is a comprehensive dataset of symphonic music.
Dataset Splits	Yes	We use the provided training set (~90% of the data) to train the model in our experiments. ... We consider 100 (5%) randomly selected samples from the Midi Caps test set (Melechovsky, Roy, and Herremans 2024).
Hardware Specification	Yes	Our models are trained on 6 NVIDIA L40S 48 GB GPUs.
Software Dependencies	No	The paper mentions using the MidiTok library, the Music21 library, FLAN T5 model, and the Adam optimizer, along with specific tokenizer methods like REMI+, but does not specify version numbers for these software components or other general dependencies like programming languages or deep learning frameworks.
Experiment Setup	Yes	For pretraining, we train for 100 epochs, with a batch size of 4 and gradient accumulation set to 4. For finetuning on Midi Caps, we trained for 30 epochs. For both runs, we use the Adam optimizer (Kingma and Ba 2014) coupled with a cosine learning rate schedule with a warm-up of 20,000 steps. For pretraining, our base learning rate is 1e-4 whereas for finetuning, we use a reduced base learning rate of 1e-6.