Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
Authors: Joe Suk, Arpit Agarwal
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish the first optimal and adaptive dynamic regret upper bound O( L1/3K1/3T 2/3), where L is the unknown number of significant Borda winner switches. We also introduce a novel weighted Borda score framework which generalizes both the Borda and Condorcet problems. |
| Researcher Affiliation | Academia | Joe Suk EMAIL Columbia University Arpit Agarwal EMAIL Columbia University |
| Pseudocode | Yes | Algorithm 1: BOSSE(tstart, m0): (Weighted) BOrda Score Soft Elimination ... Algorithm 2: Meta-BOSSE |
| Open Source Code | No | Future work can also implement Algorithm 2 on real datasets and compare with other algorithms for nonstationary dueling bandits. |
| Open Datasets | No | The paper describes a theoretical framework for K-armed dueling bandits with a pairwise mean preference matrix and does not refer to any specific datasets for experimental evaluation. |
| Dataset Splits | No | The paper is theoretical and does not use any specific datasets for empirical evaluation, therefore no dataset splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations. |