Representative Language Generation
Authors: Charlotte Peale, Vinod Raman, Omer Reingold
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce representative generation, extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our findings provide a rigorous foundation for developing more diverse and representative generative models. ... Our results consider the feasibility of representative generation with respect to all three goals. For uniform and non-uniform generation, we focus on information-theoretic bounds, analogous to sample complexity results in learning theory, without considering computational efficiency. ... We prove a strong negative result, demonstrating the impossibility of achieving representative generation in the limit using only membership queries. |
| Researcher Affiliation | Academia | Charlotte Peale * 1 Vinod Raman * 2 Omer Reingold * 1 ... 1Stanford University 2University of Michigan. Correspondence to: Charlotte Peale <EMAIL>, Vinod Raman <EMAIL>. |
| Pseudocode | No | The paper describes algorithms and generators in prose, particularly in sections like 4.1 'Barriers to Achieving Representative Generation in the Limit with only Membership Queries' and its proof in Appendix D.4. It outlines steps for generators and constructions but does not present any formal pseudocode or algorithm blocks. For example, in the sketch for Theorem 4.4, it lists steps like '1. Given examples... 2. Let Ft Ct... 3. if Ft is empty... 4. Otherwise, let hn Ft...'. This is a description, not a structured pseudocode block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, links to code repositories, or mentions of code in supplementary materials. The work is theoretical and focuses on frameworks and proofs. |
| Open Datasets | No | The paper does not use or refer to any specific publicly available datasets for experimental evaluation. It uses conceptual examples like 'diverse set of animal pictures' but these are illustrative and not actual datasets used in experiments. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, therefore, there are no mentions of dataset splits (e.g., training, validation, test splits). |
| Hardware Specification | No | The paper is entirely theoretical and does not report on any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical proofs and frameworks. It does not describe any implemented systems or experiments, and therefore, no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is a theoretical work providing frameworks, theorems, and proofs. It does not describe any empirical experiments, and consequently, there is no experimental setup, hyperparameters, or training configurations mentioned. |