Piecewise-Stationary Dueling Bandits
Authors: Patrick Kolpaczki, Eyke Hüllermeier, Viktor Bengs
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Bt WR, MDB, and DETECT empirically on synthetic problem instances by comparing them to dueling bandits algorithms for the stationary setting w.r.t. their cumulative regret. [...] In a comparison with state-of-the-art methods, we provide empirical evidence for our algorithm s superiority. |
| Researcher Affiliation | Academia | Patrick Kolpaczki EMAIL Department of Computer Science Paderborn University; Eyke Hüllermeier EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning; Viktor Bengs EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning CIFAR Fellow |
| Pseudocode | Yes | Algorithm 1 Beat the Winner Reset (Bt WR) [...] Algorithm 2 Monitored Dueling Bandits (MDB) [...] Algorithm 3 Dueling Explore-Then-Exploit Changepoint Test (DETECT) |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. The OpenReview link provided is for paper review, not for code. |
| Open Datasets | No | We have generated for each chosen scenario 500 random problem instances, each consisting of a sequence of M randomly generated preference matrices. [...] All winning probabilities between pairs that do not include am are drawn uniformly at random from [0, 1]. The next matrix P (m+1) is manipulated such that at least one of am winning probabilities changes by δ. |
| Dataset Splits | No | The paper describes generating synthetic data for each scenario, rather than using a pre-existing dataset with specified splits. No information on training/test/validation splits is provided for reproduction in the conventional sense. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper does not mention any specific software or library versions used in the implementation or experimentation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We have used ε = 0.1 and δ = 0.6 for most scenarios, but also investigate the algorithms regret curves depending on these quantities within a range of reasonable values. [...] Setting γ = (K 1) p Mw/8T, b = p w C/2, c = p 2C/w, and w to the smallest even integer 8C/δ2 with C = log( 2T (2T +1)/ MK), it holds that E h R S(T) i = O MT log(δT/KM) + R S M(Alg) . [...] Setting b = 2w log T, c = p 8 log T/w, and w to the lowest even integer 32 log T/δ2 , the expected cumulative binary weak regret of DETECT is bounded by E h R W(T) i = O KM log T + (1 p T )MT + R W M(Alg) . |