reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Representative Language Generation

Authors: Charlotte Peale, Vinod Raman, Omer Reingold

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We introduce representative generation, extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our findings provide a rigorous foundation for developing more diverse and representative generative models. ... Our results consider the feasibility of representative generation with respect to all three goals. For uniform and non-uniform generation, we focus on information-theoretic bounds, analogous to sample complexity results in learning theory, without considering computational efficiency. ... We prove a strong negative result, demonstrating the impossibility of achieving representative generation in the limit using only membership queries.
Researcher Affiliation	Academia	Charlotte Peale * 1 Vinod Raman * 2 Omer Reingold * 1 ... 1Stanford University 2University of Michigan. Correspondence to: Charlotte Peale <EMAIL>, Vinod Raman <EMAIL>.
Pseudocode	No	The paper describes algorithms and generators in prose, particularly in sections like 4.1 'Barriers to Achieving Representative Generation in the Limit with only Membership Queries' and its proof in Appendix D.4. It outlines steps for generators and constructions but does not present any formal pseudocode or algorithm blocks. For example, in the sketch for Theorem 4.4, it lists steps like '1. Given examples... 2. Let Ft Ct... 3. if Ft is empty... 4. Otherwise, let hn Ft...'. This is a description, not a structured pseudocode block.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, links to code repositories, or mentions of code in supplementary materials. The work is theoretical and focuses on frameworks and proofs.
Open Datasets	No	The paper does not use or refer to any specific publicly available datasets for experimental evaluation. It uses conceptual examples like 'diverse set of animal pictures' but these are illustrative and not actual datasets used in experiments.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets, therefore, there are no mentions of dataset splits (e.g., training, validation, test splits).
Hardware Specification	No	The paper is entirely theoretical and does not report on any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on mathematical proofs and frameworks. It does not describe any implemented systems or experiments, and therefore, no software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is a theoretical work providing frameworks, theorems, and proofs. It does not describe any empirical experiments, and consequently, there is no experimental setup, hyperparameters, or training configurations mentioned.