reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Biased Dueling Bandits with Stochastic Delayed Feedback

Authors: Bongsoo Yi, Yue Kang, Yao Li

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a comprehensive regret analysis for the two proposed algorithms and then evaluate their empirical performance on both synthetic and real datasets. [...] We conduct an empirical evaluation of the performance of RUCB-Delay and MRR-DB-Delay using six synthetic and real-world datasets.
Researcher Affiliation	Academia	Bongsoo Yi EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Yue Kang EMAIL Department of Statistics University of California, Davis Yao Li EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill
Pseudocode	Yes	Algorithm 1 RUCB-Delay Input: Time horizon T, α, M, {τd}M d=1, A = {1, 2, ..., K} Initialization: [...] Algorithm 2 Multi Round-Robin Dueling Bandit with Delayed Feedback (MRR-DB-Delay) Input: Time horizon T, {nm}m N Initialization: γ1 = 1 2, t = 1, m = 1, A1 = {1, 2, ..., K}, Tij(0) = for all i, j A1
Open Source Code	No	The paper does not provide any links to source code, explicit statements about code release, or mention of code in supplementary materials.
Open Datasets	Yes	Six rankers (K = 6): a preference matrix generated from the six retrieval functions within the full-text search engine of Ar Xiv.org (Yue & Joachims, 2011). MSLR (K = 5): a 5 × 5 preference matrix introduced by Zoghi et al. (2015a) is extracted from a subset of rankers originating from the Microsoft Learning to Rank (MSLR) dataset (Qin & Liu, 2013). Tennis (K = 8): a dataset, constructed by Ramamohan et al. (2016), is based on the results of tennis matches organized by the Association of Tennis Professionals (ATP) among 8 international tennis players. [...] Car Preference (K = 10): a dataset of car preferences (Abbasnejad et al., 2013) collected from 60 users in the United States. [...] Sushi (K = 16): a dataset derived from the sushi preference dataset (Kamishima, 2003), comprising the preferences of 5,000 Japanese users for 100 different types of sushi. Komiyama et al. (2015; 2016) selected 16 sushi types from the dataset and represented them in a preference matrix.
Dataset Splits	No	The paper describes using synthetic and real-world datasets and conducting 100 runs for regret assessment. However, it does not specify any training, testing, or validation splits for these datasets. The experiment setup focuses on a time horizon for the bandit problem rather than explicit data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper discusses various algorithms and theoretical analyses but does not list any specific software dependencies or their version numbers used for implementation or experimentation.
Experiment Setup	Yes	For all experiments, we set the time horizon to T = 200, 000. [...] Similar to Vernade et al. (2017; 2020), we assume that the delay distribution follows a geometric distribution with p = 0.01, implying a mean E[D] = 100. Also, based on our regret analysis in Theorem 2, we set α = 1.0 for RUCB-Delay. [...] We set the windowing parameter M = 1000.