reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Social Choice: The Next Generation

Authors: Niclas Boehmer, Sara Fish, Ariel D. Procaccia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the Proportional Slate Engine (PROSE) and evaluate it in experiments. [...] We evaluate PROSE on four instances drawn from drug reviews and a deliberation hosted on Polis. [...] In each case, PROSE outperforms four baseline approaches with respect to both user satisfaction and proportionality. We present a quantitative evaluation of the generated slates in Table 1.
Researcher Affiliation	Academia	1Hasso Plattner Institute, Germany 2Harvard University, USA. Correspondence to: Niclas Boehmer <EMAIL>, Sara Fish <EMAIL>.
Pseudocode	Yes	Algorithm 1 Democratic Process C,f(N, B, r)
Open Source Code	Yes	The code for PROSE and our other experiments is available at github.com/sara-fish/gen-soc-choice-next-gen.
Open Datasets	Yes	First, the publicly available UCI ML Drug Review dataset (Gr aßer et al., 2018) [...] Second, the Bowling Green dataset is drawn from a public deliberation hosted on Polis (2023)
Dataset Splits	No	From this dataset, we create three subsampled instances (each with 80 agents): Birth Control (Balanced), which contains reviews of a birth control medication with all ratings appearing equally often; Birth Control (Imbalanced), which includes only birth control reviews with extreme and central ratings, i.e., (1,2,5,9,10); and Obesity, which contains reviews on a obesity medication with all ratings appearing in equal frequency.
Hardware Specification	Yes	with runtimes of 31 65 minutes on a single Intel i7-8565U CPU @ 1.80GHz.
Software Dependencies	Yes	PROSE leverages GPT-4o when answering discriminative or generative queries. [...] We embed each agent using their description via Open AI s embedding-3-large.
Experiment Setup	Yes	In particular, for the three drug review instances, we use C = [80, 70, 60, 50, 40, 36, 32, 28, 24, 20, 16, 12, 10, 8, 6, 4, 2], while for bowlinggreen which has a different word budget per agent, we use C = [80, 60, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4]. Approval Levels We use ℓ= [5.5, 5, 4.5, 4, 3.5, 3, 2, 1, 0] for each of the instances.