Hypothesis Testing for Generalized Thurstone Models
Authors: Anuran Makur, Japneet Singh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our results through experiments on synthetic and real-world datasets. Finally, we validate our theoretical findings through synthetic and real-world experiments, proposing a data-driven approach to determine the test threshold and using the test to determine different choice functions fit to the data (Section 4). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Purdue University, West Lafayette, IN, USA 2Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA. Correspondence to: Anuran Makur <EMAIL>, Japneet Singh <EMAIL>. |
| Pseudocode | No | The paper describes methods mathematically and narratively but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about providing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | In our next experiment, we apply our test to the LMSYS chatbot leaderboard (Chiang et al., 2024), a widely used benchmark for evaluating the performance of LLMs. Additionally, we apply our testing procedure to historical NBA match outcomes using the publicly available dataset from Kaggle (Lauga, 2023). |
| Dataset Splits | Yes | First, we partition the observed comparison data Z into two (roughly) equal parts Z1 = {Zm ij : (i, j) E, m [ kij/2 ]} and Z2 = Z \ Z1. The first half of the dataset Z1 is used to estimate the parameters ˆw as shown in (7). Then, we use Z2 to calculate the test statistic T via |
| Hardware Specification | No | The paper mentions that experiments can be done "within 5 minutes on a normal CPU" but does not provide specific hardware details like CPU models, GPUs, or memory specifications. |
| Software Dependencies | No | The paper describes algorithms and methods (e.g., gradient descent) but does not specify any software libraries, frameworks, or solvers with version numbers. |
| Experiment Setup | Yes | We considered values of n ranging from 15 to 55 with intervals of 10, and set kij = k for all (i, j) E with k {12, 20}, graph topologies including complete graphs, n n toroidal grids, and sparse graphs generated from Erd os-R enyi G(n, p) models with parameter p = 2 log2(n)/n, and TF models such as standard Thurstone (Case V) and BTL models. For each choice of parameters, we generated 400 models by randomly sampling weights with b = F 1(0.98)/2 and generated synthetic comparison data. ...the estimation of ˆw was performed using a standard gradient descent algorithm with a learning rate of 0.01 and for a maximum of 3000 iterations, or until the norm of the gradient was less than 10 5. |