Generative Monoculture in Large Language Models
Authors: Fan Wu, Emily Black, Varun Chandrasekaran
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the prevalence of generative monoculture through analysis of book review and code generation tasks, and find that simple countermeasures such as altering sampling or prompting strategies are insufficient to mitigate the behavior. |
| Researcher Affiliation | Academia | Fan Wu1, Emily Black2 , Varun Chandrasekaran1 1 University of Illinois Urbana-Champaign 2 New York University Equal advising EMAIL, EMAIL |
| Pseudocode | No | The paper defines concepts and describes methods in natural language and figures, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We open source our code at https://github.com/Ge Mo LLM/Ge MO. |
| Open Datasets | Yes | For Dsrc, we use the Goodreads dataset (Wan et al., 2019), which contains multiple books with several reviews each. ... For Dsrc, we chose the Code Contests dataset (Li et al., 2022), a competitive programming problem dataset where each problem comes with multiple correct and incorrect solutions. |
| Dataset Splits | Yes | For Dsrc, we use the Goodreads dataset (Wan et al., 2019)... and craft a final dataset of N = 742 books with English titles, and i, ni = 10 reviews per book... For Dsrc, we chose the Code Contests dataset (Li et al., 2022)... For each problem in the subset, we randomly sampled i, ni = 20 correct solutions from all of the ncorrect i solutions for that problem. |
| Hardware Specification | Yes | The book review generation (N = 742, n = 10, max new tokens=500) on open-source models took around 10 hours on one H100 card per run, i.e., per combination of sampling parameters (T and p) and prompts. |
| Software Dependencies | No | The paper mentions various tools and models such as Hugging Face sentiment classifier, BERTopic, NLTK library, and COPYDETECT, but it does not specify concrete version numbers for these software components, nor for the programming language used. |
| Experiment Setup | Yes | We performed nucleus sampling (Holtzman et al., 2019) with various sampling parameters: (a) temperature T {0.5, 0.8, 1.0, 1.2, 1.5}, and (b) top-p {0.90, 0.95, 0.98, 1.00}. We also experimented with two candidates for Ptask: prompt (1) Write a personalized review of the book titled {title}: , and prompt (2) Write a book review for the book titled {title} as if you are {person}: . |