Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation

Authors: Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page 1 to showcase performances, which enables real-time interactive generation.
Researcher Affiliation Collaboration 1University of California, Berkeley, USA 2New York University, New York, USA 3Touka Technologies. Correspondence to: Haoyu Liu <EMAIL>, Zeyu Zheng <EMAIL>.
Pseudocode Yes Algorithm 1 DDPM sampling with fine-grained harmonic control Algorithm 2 DDPM sampling with fine-grained textural guidance
Open Source Code Yes The demo page is available at https://huajianduzhuocode.github.io/FGG-diffusion-music/, we also release the complete source code at https://github.com/huajianduzhuocode/FGG-music-code
Open Datasets Yes We use the POP909 dataset (Wang et al., 2020a) for training and evaluation. This dataset consists of 909 MIDI pieces of pop songs, each containing lead melodies, chord progression, and piano accompaniment tracks.
Dataset Splits Yes We use the POP909 dataset (Wang et al., 2020a) for training and evaluation. This dataset consists of 909 MIDI pieces of pop songs, each containing lead melodies, chord progression, and piano accompaniment tracks. We exclude 29 pieces that are in triple meter. 90% of the data are used to train our model, and the remaining 10% are used for evaluation.
Hardware Specification Yes It takes 0.4 seconds to generate the 4-measure accompaniment on a NVIDIA RTX 6000 Ada Generation GPU.
Software Dependencies No The paper mentions an 'Adam W optimizer' but does not specify software names with version numbers for libraries or frameworks used (e.g., Python, PyTorch version).
Experiment Setup Yes We set diffusion timesteps T = 1000 with β0 = 8.5e 4 and βT = 1.2e 2. We use Adam W optimizer with a learning rate of 5e 5, β1 = 0.9, and β2 = 0.999. We applied data augmentation by transposing each 4-measure piece into all 12 keys. ... Training is conducted with a batch size of 16, utilizing random sampling without replacement. ... resulting in a total of 23,642 iterations.