Selective Preference Aggregation
Authors: Shreyas Kadekodi, Hayden Mctavish, Berk Ustun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we conduct an extensive set of experiments on real-world datasets to benchmark our approach and demonstrate its functionality. Our results show how selective rankings can promote transparency and robustness by revealing disagreement and abstaining from arbitration. Section 5: Experiments. In this section, we present an empirical study of selective aggregation on real-world datasets. Our goal is to benchmark the properties and behavior of selective rankings with respect to existing approaches in terms of transparency, robustness, and versatility. |
| Researcher Affiliation | Academia | Shreyas Kadekodi 1 * Hayden Mc Tavish 2 * Berk Ustun 1. *Equal contribution 1UCSD 2Duke University. Correspondence to: Berk Ustun <EMAIL>. UCSD and Duke University are academic institutions, and the email domain @ucsd.edu further confirms an academic affiliation. |
| Pseudocode | Yes | Algorithm 1 Selective Preference Aggregation. Algorithm 2 Solution Path Algorithm. |
| Open Source Code | Yes | We provide an open-source Python library for selective preference aggregation, available on Git Hub and installable via pip install selectiverank. We include additional results in Appendix D, and code to reproduce our results on Git Hub. |
| Open Datasets | Yes | We work with 5 preference datasets from different domains listed in Table 1. Each dataset encodes user preferences over items as votes, ratings, or rankings. We convert preferences to pairwise comparisons with ties and build rankings using our approach and baselines. - nba [49] - survivor [51] - lawschool [44] - csrankings [11] - sushi [39] We also work with the DICES dataset [7]. |
| Dataset Splits | Yes | We randomly split users into two groups: a group of ptrain = 5 users whose labels we use to train our model; and a group of ptest = 118 users whose labels we use to evaluate the predictions of the model at an individual level once it is deployed. All experiments used 5-fold cross-validation on the training split. |
| Hardware Specification | Yes | All results reflect timings on a consumer-grade CPU with 2.3 GHz and 16 GB RAM. In our experiments, we are able to recover a certifiably optimal ranking quickly for 4/5 datasets using a commercial solver on a single-core CPU with 128GB RAM. |
| Software Dependencies | Yes | We report results for an exact approach that handles ties and returns a certifiably optimal ranking by solving an integer program using CPLEX v22 [35]. |
| Experiment Setup | Yes | We fine-tuned a BERT-Mini model; all fine-tuning experiments used 5-fold cross-validation on the training split. We optimized with a learning rate of 2e-5 for up to 25 epochs, employing early stopping. We trained in mini-batches of size 16 and enabled oversampling of minority classes in each batch. |