reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

Authors: Yasin Abbasi-Yadkori, András György, Nevena Lazić

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which is deﬁned as the diﬀerence between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative reward of the learning algorithm. One way to measure the hardness of such environments is to consider how many times the identity of the optimal arm can change. We propose a method that achieves, in K-armed bandit problems, a near-optimal e O( p KN(S + 1)) dynamic regret, where N is the time horizon of the problem and S is the number of times the identity of the optimal arm changes, without prior knowledge of S. [...] The rest of the paper (Section 4) is devoted to the proof of the above results.
Researcher Affiliation	Industry	Yasin Abbasi-Yadkori EMAIL Google Deep Mind, London, UK Andr as Gy orgy EMAIL Google Deep Mind, London, UK Nevena Lazi c EMAIL Google Deep Mind, Mountain View, USA
Pseudocode	Yes	Algorithm 1 Arm Switch algorithm Input: time horizon N, constant δ (0, 1) [...] Algorithm 2 The Elimn(a , a) subroutine
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. The text only describes the theoretical algorithm.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds for a multi-armed bandit problem. It does not involve empirical experiments using specific datasets. Thus, no datasets are used or made available.
Dataset Splits	No	The paper is theoretical and presents an algorithm with mathematical regret bounds. It does not involve empirical experiments with datasets, and therefore no dataset splits are discussed.
Hardware Specification	No	The paper focuses on theoretical analysis and algorithm design for multi-armed bandits, not empirical evaluations. Therefore, there is no mention of hardware specifications for running experiments.
Software Dependencies	No	The paper presents a theoretical algorithm and its regret bounds without describing any implementation details or software dependencies with version numbers.
Experiment Setup	No	The paper is a theoretical work on multi-armed bandits, focusing on mathematical proofs and algorithm design. It does not include an experimental section with specific setup details or hyperparameters.