reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Representative Ranking for Deliberation in the Public Sphere

Authors: Manon Revel, Smitha Milli, Tyler Lu, Jamelle Watson-Daniels, Maximilian Nickel

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments We empirically investigate the impact of ensuring JR through both real-world experiments and simulations. In the real-world setting we explore ranking comments on a collective response system (Konya et al., 2023b), where users shared their thoughts on campus protests and the right to assemble. The simulations investigate the implications of our theoretical findings based on the Mallows mixture model and can be found in Appendix D.2.
Researcher Affiliation	Collaboration	1Meta FAIR, New York, NY, USA 2Berkman Klein Center, Harvard University, Cambridge, MA, USA 3Meta, New York, NY, USA.
Pseudocode	Yes	Algorithm D.1 The Greedy CC algorithm proceeds in two stages. In the first stage, it identifies an n/k-justifying set by greedily selecting comments to maximize coverage.17Once an n/k-justifying set is found, the algorithm proceeds to the second stage, where it fills any remaining slots with the comments with highest conversational norm scores.
Open Source Code	No	The paper mentions that "Our code builds upon the abcvoting Python package for approval-based multi-winner voting rules (Lackner et al., 2023)" and that "The data are available at https://github.com/ akonya/polarized-issues-data." However, it does not provide an explicit statement or link for the source code of the methodology developed in this paper.
Open Datasets	Yes	Remesh dataset. We now investigate the impact of enforcing JR in a real-world setting: ranking comments about campus protests and the right to assemble.11 These comments were generated as part of two sessions on Remesh, a popularly-used collective response system (CRS) (Ovadya, 2023), each with about 300 participants (see Appendix E.1 for summary statistics about the dataset). Collective response systems are used by governments, companies, and non-profits to elicit the opinions of a target population at scale, in a more participatory way than traditional polling. In particular, on a CRS, participants give their opinion to different questions via free-form responses, and then vote on whether they agree with other participants’ comments.12 The data are available at https://github.com/ akonya/polarized-issues-data.
Dataset Splits	No	The paper describes using the Remesh dataset and explains how approval inferences were thresholded, but it does not provide specific training/test/validation dataset splits. The dataset is used for evaluation without explicit partitioning into such splits for model training or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments or simulations. It discusses software and platforms but not the underlying physical machines.
Software Dependencies	No	The paper states, "Our code builds upon the abcvoting Python package for approval-based multi-winner voting rules (Lackner et al., 2023)." While a specific package is named, a version number for this package is not explicitly provided. The paper also mentions "Google Jigsaw’s Perspective API" but without version details.
Experiment Setup	Yes	Experimental setup. We examine the impact of ranking the comments using the following three scoring functions, with and without a JR constraint. 1. Engagement feng (Eq. 2). 2. Maximin diverse approval f DA where participants selfreported political ideologies were used to form the three groups (left-, center-, and right-leaning users) that were considered in the function f DA (see Eq. 3). 3. A Perspective API classifier f C that scores how bridging a comment is based upon its text (Saltz et al., 2024). ... For each question in the Remesh dataset14 and each score function f, we compare the optimal set S (f) of k = 8 comments (without a JR constraint) and the set SGreedy JR (f) which optimizes the score function while ensuring JR through the Greedy CC approximation algorithm (Alg. D.1). ... To investigate this, we created a Mallows mixture model M2([ϕ, ϕ], [σ1, σ2], [1/2, 1/2]) that simulates a polarized environment with two groups. ... As in Section 5.2, we generate approval votes for a user’s preference π Mγ([ϕ, ϕ], [σ1, σ2], [1/2, 1/2]) by thresholding and taking the top τ items to be approved.