Adaptation to the Range in K-Armed Bandits

Authors: Hédi Hadiji, Gilles Stoltz

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide some numerical experiments on synthetic data to illustrate the qualitative behavior of some popular algorithms like UCB strategies when they are incorrectly tuned, as opposed to strategies that are less sensitive to ignoring the range or to the AHB strategy which adapts to it. These experiments are only of an illustrative nature. On Figures 2 and 3 we plot the estimates b RT (α)/α of the rescaled regret as solid lines. The shaded areas correspond to 2 standard errors of the sequences b RT (α, n)/α
Researcher Affiliation Academia H edi Hadiji EMAIL Gilles Stoltz EMAIL Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay, 91405, Orsay, France
Pseudocode Yes Algorithm 1 AHB: Ada Hedge for K armed Bandits, with extra-exploration
Open Source Code No The paper does not provide an explicit statement of code release or a link to a code repository. It refers to an "extended version of this article [ar Xiv:2006.03378]" which is a preprint server, not a code repository.
Open Datasets No We provide some numerical experiments on synthetic data to illustrate the qualitative behavior of some popular algorithms like UCB strategies when they are incorrectly tuned, as opposed to strategies that are less sensitive to ignoring the range or to the AHB strategy which adapts to it. These experiments are only of an illustrative nature.
Dataset Splits No The paper describes a simulation setup for bandit problems rather than dataset splits like training/validation/test sets.
Hardware Specification No The paper discusses numerical illustrations but does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x).
Experiment Setup Yes The main algorithm of interest is, of course, the AHB strategy with extra-exploration (Algorithm 1), which we tune as stated in Theorem 7 with parameter 1/2. Following Auer et al. (2002a), we used the tuning εt = min 1, 5K d t with d = 1/12 . We consider instances of UCB (Auer et al., 2002a) using indices of the form bµa(t) + s 2 ln T Na(t) , where s {0.01, 1, 100} .