Piecewise-Stationary Dueling Bandits

Authors: Patrick Kolpaczki, Eyke Hüllermeier, Viktor Bengs

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Bt WR, MDB, and DETECT empirically on synthetic problem instances by comparing them to dueling bandits algorithms for the stationary setting w.r.t. their cumulative regret. [...] In a comparison with state-of-the-art methods, we provide empirical evidence for our algorithm s superiority.
Researcher Affiliation Academia Patrick Kolpaczki EMAIL Department of Computer Science Paderborn University; Eyke Hüllermeier EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning; Viktor Bengs EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning CIFAR Fellow
Pseudocode Yes Algorithm 1 Beat the Winner Reset (Bt WR) [...] Algorithm 2 Monitored Dueling Bandits (MDB) [...] Algorithm 3 Dueling Explore-Then-Exploit Changepoint Test (DETECT)
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. The OpenReview link provided is for paper review, not for code.
Open Datasets No We have generated for each chosen scenario 500 random problem instances, each consisting of a sequence of M randomly generated preference matrices. [...] All winning probabilities between pairs that do not include am are drawn uniformly at random from [0, 1]. The next matrix P (m+1) is manipulated such that at least one of am winning probabilities changes by δ.
Dataset Splits No The paper describes generating synthetic data for each scenario, rather than using a pre-existing dataset with specified splits. No information on training/test/validation splits is provided for reproduction in the conventional sense.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies No The paper does not mention any specific software or library versions used in the implementation or experimentation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We have used ε = 0.1 and δ = 0.6 for most scenarios, but also investigate the algorithms regret curves depending on these quantities within a range of reasonable values. [...] Setting γ = (K 1) p Mw/8T, b = p w C/2, c = p 2C/w, and w to the smallest even integer 8C/δ2 with C = log( 2T (2T +1)/ MK), it holds that E h R S(T) i = O MT log(δT/KM) + R S M(Alg) . [...] Setting b = 2w log T, c = p 8 log T/w, and w to the lowest even integer 32 log T/δ2 , the expected cumulative binary weak regret of DETECT is bounded by E h R W(T) i = O KM log T + (1 p T )MT + R W M(Alg) .