Adaptation to the Range in K-Armed Bandits
Authors: Hédi Hadiji, Gilles Stoltz
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide some numerical experiments on synthetic data to illustrate the qualitative behavior of some popular algorithms like UCB strategies when they are incorrectly tuned, as opposed to strategies that are less sensitive to ignoring the range or to the AHB strategy which adapts to it. These experiments are only of an illustrative nature. On Figures 2 and 3 we plot the estimates b RT (α)/α of the rescaled regret as solid lines. The shaded areas correspond to 2 standard errors of the sequences b RT (α, n)/α |
| Researcher Affiliation | Academia | H edi Hadiji EMAIL Gilles Stoltz EMAIL Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay, 91405, Orsay, France |
| Pseudocode | Yes | Algorithm 1 AHB: Ada Hedge for K armed Bandits, with extra-exploration |
| Open Source Code | No | The paper does not provide an explicit statement of code release or a link to a code repository. It refers to an "extended version of this article [ar Xiv:2006.03378]" which is a preprint server, not a code repository. |
| Open Datasets | No | We provide some numerical experiments on synthetic data to illustrate the qualitative behavior of some popular algorithms like UCB strategies when they are incorrectly tuned, as opposed to strategies that are less sensitive to ignoring the range or to the AHB strategy which adapts to it. These experiments are only of an illustrative nature. |
| Dataset Splits | No | The paper describes a simulation setup for bandit problems rather than dataset splits like training/validation/test sets. |
| Hardware Specification | No | The paper discusses numerical illustrations but does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x). |
| Experiment Setup | Yes | The main algorithm of interest is, of course, the AHB strategy with extra-exploration (Algorithm 1), which we tune as stated in Theorem 7 with parameter 1/2. Following Auer et al. (2002a), we used the tuning εt = min 1, 5K d t with d = 1/12 . We consider instances of UCB (Auer et al., 2002a) using indices of the form bµa(t) + s 2 ln T Na(t) , where s {0.01, 1, 100} . |