Diversity-Rewarded CFG Distillation
Authors: Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Rame
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the Music LM text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG. |
| Researcher Affiliation | Industry | Geoffrey Cideron , Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie Olivier Bachem, Sarah Perrin*, Alexandre Ram e* Google Deep Mind, * Equal advisory contribution Correspondence to: Geoffrey Cideron <EMAIL> |
| Pseudocode | No | The paper includes mathematical equations (1-6) and derivations in Appendix A but does not feature any clearly labeled pseudocode blocks or algorithms formatted with structured steps. |
| Open Source Code | No | Explore our generations at google-research.github.io/seanet/musiclm/diverse music. This link directs to a webpage for exploring generated music samples, not to source code for the methodology. |
| Open Datasets | Yes | We use the prompt dataset described in Section 4.1 from Cideron et al. (2024)... and prompts derived from Music Caps (Agostinelli et al., 2023). ...using 16 k Hz audio excerpts sourced from the same training dataset as Agostinelli et al. (2023). |
| Dataset Splits | No | The paper mentions using a batch size of 128 and provides details about human evaluation prompts (101 for quality, 50 for diversity), but it does not specify training, validation, or test splits for the datasets used to train the models. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions using a 'LLM transformer-based architecture', 'RL algorithm is a variant of REINFORCE (Williams, 1992)', and 'semi-hard triplet loss (Schroff et al., 2015)' but does not specify any software names with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | For CFG, we set γ = 3 and use the negative prompt Bad audio quality. ...with temperature T = 0.99. ...We use a batch size of 128 and a learning rate of 0.00015 for all our finetunings. |