A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

Authors: Yasin Abbasi-Yadkori, András György, Nevena Lazić

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative reward of the learning algorithm. One way to measure the hardness of such environments is to consider how many times the identity of the optimal arm can change. We propose a method that achieves, in K-armed bandit problems, a near-optimal e O( p KN(S + 1)) dynamic regret, where N is the time horizon of the problem and S is the number of times the identity of the optimal arm changes, without prior knowledge of S. [...] The rest of the paper (Section 4) is devoted to the proof of the above results.
Researcher Affiliation Industry Yasin Abbasi-Yadkori EMAIL Google Deep Mind, London, UK Andr as Gy orgy EMAIL Google Deep Mind, London, UK Nevena Lazi c EMAIL Google Deep Mind, Mountain View, USA
Pseudocode Yes Algorithm 1 Arm Switch algorithm Input: time horizon N, constant δ (0, 1) [...] Algorithm 2 The Elimn(a , a) subroutine
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. The text only describes the theoretical algorithm.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret bounds for a multi-armed bandit problem. It does not involve empirical experiments using specific datasets. Thus, no datasets are used or made available.
Dataset Splits No The paper is theoretical and presents an algorithm with mathematical regret bounds. It does not involve empirical experiments with datasets, and therefore no dataset splits are discussed.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design for multi-armed bandits, not empirical evaluations. Therefore, there is no mention of hardware specifications for running experiments.
Software Dependencies No The paper presents a theoretical algorithm and its regret bounds without describing any implementation details or software dependencies with version numbers.
Experiment Setup No The paper is a theoretical work on multi-armed bandits, focusing on mathematical proofs and algorithm design. It does not include an experimental section with specific setup details or hyperparameters.