Detecting Music Performance Errors with Transformers

Authors: Benjamin Shiue-Hal Chou, Purvish Jajal, Nicholas John Eliopoulos, Tim Nadolsky, Cheng-Yun Yang, Nikita Ravi, James C. Davis, Kristen Yeon-Ji Yun, Yung-Hsiang Lu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Polytune and previous works on Coco Chorales-E and MAESTRO-E, which encompass 14 different instruments and a variety of performance errors. To evaluate error detection performance, we adapt the transcription F1 score commonly used in music transcription tasks (Raffel et al. 2014). We present a comparison of our method against the baseline across different categories for Error F1, precision, and recall. As shown in Tab. 3, our method generally outperforms the baseline derived from (Benetos, Klapuri, and Dixon 2012; Wang, Ewert, and Dixon 2017).
Researcher Affiliation Academia Benjamin Shiue-Hal Chou, Purvish Jajal, Nicholas John Eliopoulos, Tim Nadolsky, Cheng-Yun Yang, Nikita Ravi, James C. Davis, Kristen Yeon-Ji Yun, Yung-Hsiang Lu School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA, 47907 EMAIL
Pseudocode Yes Algorithm 1: MIDI Error Generation Algorithm. This algorithm introduces errors into MIDI files. Abbreviations: PC (pitch change), TS (timing shift), EN (extra note).
Open Source Code Yes Code https://github.com/ben2002chou/Polytune
Open Datasets Yes Thus, we introduce an algorithm for synthetically generating errors in existing music datasets, Coco Chorales (Wu et al. 2022a) and MAESTRO (Hawthorne et al. 2018). We name the resulting augmented datasets as Coco Chorales-E and MAESTRO-E, respectively.
Dataset Splits No All results are based on a combined test set of 4401 tracks.
Hardware Specification Yes All models were trained on an NVIDIA A100-80GB GPU running a Linux operating system. The datasets introduced in this work, MAESTRO-E and Coco Chorales-E, were generated using AMD EPYC 7713 3.0 GHz CPUs.
Software Dependencies Yes We used Pytorch 2.3.0 and Huggingface Transformers 4.40.1 for model design and training. The mir eval package is used for evaluating Error Detection F1 scores.
Experiment Setup Yes To address this imbalance, we use a weighted cross-entropy loss, as shown in Equation 1. Equation 1 defines the weighted cross-entropy loss L, averaged over N tokens. CE(yi, ˆyi) is the cross-entropy between true label yi and prediction ˆyi, weighted by a function α(yi). For our training, α(yi) is 10 when yi is an error token and 1 when it is not. ...We introduce errors into each MIDI file by selecting notes based on a Poisson distribution with a mean rate parameter λ, where λ is sampled from a uniform distribution U(0.1, 0.4) and applying an error type. Then, the randomly selected notes are assigned an error type, and their time and pitch are augmented accordingly. Offset magnitudes for time and pitch are sampled from two truncated normal distribution distributions, P and Q, with mean 0 and standard deviation of 1 and 0.02, respectively.