A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
Authors: Yasin Abbasi-Yadkori, András György, Nevena Lazić
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative reward of the learning algorithm. One way to measure the hardness of such environments is to consider how many times the identity of the optimal arm can change. We propose a method that achieves, in K-armed bandit problems, a near-optimal e O( p KN(S + 1)) dynamic regret, where N is the time horizon of the problem and S is the number of times the identity of the optimal arm changes, without prior knowledge of S. [...] The rest of the paper (Section 4) is devoted to the proof of the above results. |
| Researcher Affiliation | Industry | Yasin Abbasi-Yadkori EMAIL Google Deep Mind, London, UK Andr as Gy orgy EMAIL Google Deep Mind, London, UK Nevena Lazi c EMAIL Google Deep Mind, Mountain View, USA |
| Pseudocode | Yes | Algorithm 1 Arm Switch algorithm Input: time horizon N, constant δ (0, 1) [...] Algorithm 2 The Elimn(a , a) subroutine |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. The text only describes the theoretical algorithm. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret bounds for a multi-armed bandit problem. It does not involve empirical experiments using specific datasets. Thus, no datasets are used or made available. |
| Dataset Splits | No | The paper is theoretical and presents an algorithm with mathematical regret bounds. It does not involve empirical experiments with datasets, and therefore no dataset splits are discussed. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design for multi-armed bandits, not empirical evaluations. Therefore, there is no mention of hardware specifications for running experiments. |
| Software Dependencies | No | The paper presents a theoretical algorithm and its regret bounds without describing any implementation details or software dependencies with version numbers. |
| Experiment Setup | No | The paper is a theoretical work on multi-armed bandits, focusing on mathematical proofs and algorithm design. It does not include an experimental section with specific setup details or hyperparameters. |