Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

Authors: Joe Suk, Arpit Agarwal

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish the first optimal and adaptive dynamic regret upper bound O( L1/3K1/3T 2/3), where L is the unknown number of significant Borda winner switches. We also introduce a novel weighted Borda score framework which generalizes both the Borda and Condorcet problems.
Researcher Affiliation Academia Joe Suk EMAIL Columbia University Arpit Agarwal EMAIL Columbia University
Pseudocode Yes Algorithm 1: BOSSE(tstart, m0): (Weighted) BOrda Score Soft Elimination ... Algorithm 2: Meta-BOSSE
Open Source Code No Future work can also implement Algorithm 2 on real datasets and compare with other algorithms for nonstationary dueling bandits.
Open Datasets No The paper describes a theoretical framework for K-armed dueling bandits with a pairwise mean preference matrix and does not refer to any specific datasets for experimental evaluation.
Dataset Splits No The paper is theoretical and does not use any specific datasets for empirical evaluation, therefore no dataset splits are provided.
Hardware Specification No The paper is theoretical and does not describe any experimental hardware specifications.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations.