SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training

Authors: Jiaxing Yu, Xinda Wu, Yunfei Xu, Tieyao Zhang, Songruoyao Wu, Le Ma, Kejun Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The objective and subjective results indicate that Song GLM can generate melodies from lyrics with significant improvements in both alignment and harmony, outperforming all the previous baseline methods. Specifically, DA indicates that Song GLMsmall outperforms in lyric-melody alignment, while DP , DD, DIOI and MD suggest that Song GLMsmall is the most capable of ensuring the harmony between lyrics and melodies. Table 2 shows the subjective results, from which we can see that for melody itself, Song GLMsmall can generate diverse and consistent melodies. For the overall song, Song GLMsmall not only ensures the rhythmic and structural consistency between lyrics and melody, but also achieves the best results in singability and overall performance.
Researcher Affiliation Collaboration 1College of Computer Science and Technology, Zhejiang University 2AI Center, Guangdong OPPO Mobile Telecommunications Corp., Ltd. 3Innovation Center of Yangtze River Delta, Zhejiang University
Pseudocode No The paper describes methods with diagrams and text, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks are present.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository.
Open Datasets Yes Furthermore, we construct a large-scale lyric-melody paired dataset based on Melody Net (Wu et al. 2023), that contains more than 200,000 English song pieces for pre-training and fine-tuning. In this paper, we acquire approximately 1.6 million raw MIDI data from Melody Net (Wu et al. 2023), and construct a large-scale lyric-melody paired dataset with varied word-note alignments, including both one-to-one and one-to-multiple alignments.
Dataset Splits No We extract LMD-full dataset and Reddit-sourced dataset from the processed dataset, with a total of 8,195 pieces, for fine-tuning and use the remaining part for pre-training. This specifies which parts are used for fine-tuning and pre-training, but does not provide explicit train/validation/test splits within these portions.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using GLM and Transformer architectures but does not specify any software libraries or dependencies with their version numbers.
Experiment Setup No The paper describes model configurations (Song GLMsmall, Song GLMbase) and the datasets used for pre-training and fine-tuning. However, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings in the main text.