Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
Authors: Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page 1 to showcase performances, which enables real-time interactive generation. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley, USA 2New York University, New York, USA 3Touka Technologies. Correspondence to: Haoyu Liu <EMAIL>, Zeyu Zheng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 DDPM sampling with fine-grained harmonic control Algorithm 2 DDPM sampling with fine-grained textural guidance |
| Open Source Code | Yes | The demo page is available at https://huajianduzhuocode.github.io/FGG-diffusion-music/, we also release the complete source code at https://github.com/huajianduzhuocode/FGG-music-code |
| Open Datasets | Yes | We use the POP909 dataset (Wang et al., 2020a) for training and evaluation. This dataset consists of 909 MIDI pieces of pop songs, each containing lead melodies, chord progression, and piano accompaniment tracks. |
| Dataset Splits | Yes | We use the POP909 dataset (Wang et al., 2020a) for training and evaluation. This dataset consists of 909 MIDI pieces of pop songs, each containing lead melodies, chord progression, and piano accompaniment tracks. We exclude 29 pieces that are in triple meter. 90% of the data are used to train our model, and the remaining 10% are used for evaluation. |
| Hardware Specification | Yes | It takes 0.4 seconds to generate the 4-measure accompaniment on a NVIDIA RTX 6000 Ada Generation GPU. |
| Software Dependencies | No | The paper mentions an 'Adam W optimizer' but does not specify software names with version numbers for libraries or frameworks used (e.g., Python, PyTorch version). |
| Experiment Setup | Yes | We set diffusion timesteps T = 1000 with β0 = 8.5e 4 and βT = 1.2e 2. We use Adam W optimizer with a learning rate of 5e 5, β1 = 0.9, and β2 = 0.999. We applied data augmentation by transposing each 4-measure piece into all 12 keys. ... Training is conducted with a batch size of 16, utilizing random sampling without replacement. ... resulting in a total of 23,642 iterations. |