Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
Authors: Louis Bradshaw, Simon Colton
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes. The data pipeline we use is multi-stage... We provide an in-depth analysis of our techniques, offering statistical insights, and investigate the content by extracting metadata tags... We evaluate the effectiveness of the components in our data pipeline. Where applicable, we compare our methods to those used in previous work, in particular the Giant MIDI, ATEPP, and Pi JAMA datasets. For baselines and to determine ground truth, we relied on human labels... Classification precision, recall, and F1 scores can be seen in Table 3. |
| Researcher Affiliation | Academia | Louis Bradshaw, Simon Colton Queen Mary University of London EMAIL |
| Pseudocode | No | The paper describes algorithms in text, such as the pseudo-labeling process (Figure 1 description) and the sliding-window technique for audio segmentation, but it does not contain any structured pseudocode blocks or figures explicitly labeled as algorithms. For example, it describes the audio segmentation as: 'To do this, we employ a sliding-window based technique adapted from standard approaches (Keogh et al., 2004), aimed at accurately removing non-piano content whilst being robust to short-lived classification mistakes. Given an audio recording, we score each five-second interval, sampled with a one-second stride, by passing the inputs through our model.' |
| Open Source Code | Yes | Dataset available at https://github.com/loubbrad/aria-midi. We outline a process for distilling an audio source-separation model to train a classifier capable of accurately identifying and segmenting diverse real-world piano recordings, which we open-source1. 1https://github.com/loubbrad/aria-cl We used a Whisper-based model, Aria-AMT5 (Bradshaw et al.), to transcribe the segmented audio recordings into MIDI files. 5https://github.com/EleutherAI/aria-amt |
| Open Datasets | Yes | We introduce an extensive new dataset of MIDI files... Dataset available at https://github.com/loubbrad/aria-midi. Initial investigations revealed that relying on well-known datasets such as MAESTRO (Hawthorne et al., 2018) and Audio Set (Gemmeke et al., 2017) was insufficient... We also used the Giant MIDI audio files, the Jazz Trio Database (Cheston et al., 2024)... |
| Dataset Splits | No | The paper describes selecting a 'random sample of 250 videos' and 'a random sample of 250 audio recordings' for evaluation and analysis of its methodology components, excluding those used during training. However, it does not provide explicit training, validation, or test splits with specific percentages, counts, or references to predefined splits for either its own created dataset or for the specific models it trains (e.g., the audio classifier) in a way that would allow direct reproduction of the entire experimental setup with fixed data partitions. |
| Hardware Specification | Yes | Transcription of the 100,629 hours of audio took 765 hours using an NVIDIA H100 GPU with a batch size of 128... In comparison, classification of 100,000 hours of audio using our model only took 20 A100 hours, I/O being the main bottle-neck. |
| Software Dependencies | No | The paper mentions using 'the 70B parameter version of Llama 3.1 (Dubey et al., 2024)' for the language model, applying 'the MVSep Piano source-separation model (Uhlich et al., 2024; Fabbro, 2024; Solovyev et al., 2023)', and using a 'Whisper-based model, Aria-AMT5 (Bradshaw et al.)' for transcription. It also states the model was trained 'using the Adam W optimizer (Loshchilov and Hutter, 2019)' and results were calculated 'using the mir eval library (Raffel et al., 2014)'. However, specific version numbers for software libraries, frameworks, or optimizers (like PyTorch version, TensorFlow version, or mir_eval version) are not provided. |
| Experiment Setup | Yes | For our solo-piano classifier, ... We trained the model for ten epochs using the Adam W optimizer (Loshchilov and Hutter, 2019) with β1, β2 = 0.9, 0.95, ϵ = 1e-6 and an L2 weight decay of 0.01. A linear learning rate scheduler was used, decaying to 10% of the initial learning rate after a warmup over the first 500 optimizer steps... The parameters d and λ control the sensitivity and minimum length of non-piano segments, which we set to 3 and 0.5 respectively. |