Generative Monoculture in Large Language Models

Authors: Fan Wu, Emily Black, Varun Chandrasekaran

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the prevalence of generative monoculture through analysis of book review and code generation tasks, and find that simple countermeasures such as altering sampling or prompting strategies are insufficient to mitigate the behavior.
Researcher Affiliation Academia Fan Wu1, Emily Black2 , Varun Chandrasekaran1 1 University of Illinois Urbana-Champaign 2 New York University Equal advising EMAIL, EMAIL
Pseudocode No The paper defines concepts and describes methods in natural language and figures, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We open source our code at https://github.com/Ge Mo LLM/Ge MO.
Open Datasets Yes For Dsrc, we use the Goodreads dataset (Wan et al., 2019), which contains multiple books with several reviews each. ... For Dsrc, we chose the Code Contests dataset (Li et al., 2022), a competitive programming problem dataset where each problem comes with multiple correct and incorrect solutions.
Dataset Splits Yes For Dsrc, we use the Goodreads dataset (Wan et al., 2019)... and craft a final dataset of N = 742 books with English titles, and i, ni = 10 reviews per book... For Dsrc, we chose the Code Contests dataset (Li et al., 2022)... For each problem in the subset, we randomly sampled i, ni = 20 correct solutions from all of the ncorrect i solutions for that problem.
Hardware Specification Yes The book review generation (N = 742, n = 10, max new tokens=500) on open-source models took around 10 hours on one H100 card per run, i.e., per combination of sampling parameters (T and p) and prompts.
Software Dependencies No The paper mentions various tools and models such as Hugging Face sentiment classifier, BERTopic, NLTK library, and COPYDETECT, but it does not specify concrete version numbers for these software components, nor for the programming language used.
Experiment Setup Yes We performed nucleus sampling (Holtzman et al., 2019) with various sampling parameters: (a) temperature T {0.5, 0.8, 1.0, 1.2, 1.5}, and (b) top-p {0.90, 0.95, 0.98, 1.00}. We also experimented with two candidates for Ptask: prompt (1) Write a personalized review of the book titled {title}: , and prompt (2) Write a book review for the book titled {title} as if you are {person}: .