reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Piecewise-Stationary Dueling Bandits

Authors: Patrick Kolpaczki, Eyke Hüllermeier, Viktor Bengs

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Bt WR, MDB, and DETECT empirically on synthetic problem instances by comparing them to dueling bandits algorithms for the stationary setting w.r.t. their cumulative regret. [...] In a comparison with state-of-the-art methods, we provide empirical evidence for our algorithm s superiority.
Researcher Affiliation	Academia	Patrick Kolpaczki EMAIL Department of Computer Science Paderborn University; Eyke Hüllermeier EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning; Viktor Bengs EMAIL Institute of Informatics, LMU Munich Munich Center for Machine Learning CIFAR Fellow
Pseudocode	Yes	Algorithm 1 Beat the Winner Reset (Bt WR) [...] Algorithm 2 Monitored Dueling Bandits (MDB) [...] Algorithm 3 Dueling Explore-Then-Exploit Changepoint Test (DETECT)
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. The OpenReview link provided is for paper review, not for code.
Open Datasets	No	We have generated for each chosen scenario 500 random problem instances, each consisting of a sequence of M randomly generated preference matrices. [...] All winning probabilities between pairs that do not include am are drawn uniformly at random from [0, 1]. The next matrix P (m+1) is manipulated such that at least one of am winning probabilities changes by δ.
Dataset Splits	No	The paper describes generating synthetic data for each scenario, rather than using a pre-existing dataset with specified splits. No information on training/test/validation splits is provided for reproduction in the conventional sense.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies	No	The paper does not mention any specific software or library versions used in the implementation or experimentation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We have used ε = 0.1 and δ = 0.6 for most scenarios, but also investigate the algorithms regret curves depending on these quantities within a range of reasonable values. [...] Setting γ = (K 1) p Mw/8T, b = p w C/2, c = p 2C/w, and w to the smallest even integer 8C/δ2 with C = log( 2T (2T +1)/ MK), it holds that E h R S(T) i = O MT log(δT/KM) + R S M(Alg) . [...] Setting b = 2w log T, c = p 8 log T/w, and w to the lowest even integer 32 log T/δ2 , the expected cumulative binary weak regret of DETECT is bounded by E h R W(T) i = O KM log T + (1 p T )MT + R W M(Alg) .