reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bandit problems with fidelity rewards

Authors: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose two models for ﬁdelity. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played. In the subscription model the additional reward depends on the current number of consecutive draws of the arm. We consider both stochastic and adversarial problems. Since single-arm strategies are not always optimal in stochastic problems, the notion of regret in the adversarial setting needs careful adjustment. We introduce three possible notions of regret and investigate which can be bounded sublinearly. We study in detail the special cases of increasing, decreasing and coupon (where the player gets an additional reward after every m plays of an arm) ﬁdelity rewards. For the models which do not necessarily enjoy sublinear regret, we provide a worst case lower bound. For those models which exhibit sublinear regret, we provide algorithms and bound their regret. Keywords: multi-armed bandit problem, ﬁdelity reward, regret minimization
Researcher Affiliation	Collaboration	G abor Lugosi EMAIL ICREA Barcelona, Spain, and Department of Economics and Business Pompeu Fabra University Barcelona, Spain, and Barcelona Graduate School of Economics Barcelona, Spain Ciara Pike-Burke EMAIL Department of Mathematics Imperial College London London, UK Pierre-Andr e Savalle EMAIL Cisco Systems, Inc. Paris, France
Pseudocode	Yes	Figure 1: EXP4 with pseudo rewards for the adversarial loyalty points bandits model with decreasing ﬁdelity reward. ... Figure 2: Lazy strategy for adversarial bandits with decreasing ﬁdelity rewards, subscription model
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	This is a theoretical paper that does not use or reference any specific datasets for experimental evaluation.
Dataset Splits	No	This is a theoretical paper that does not involve experimental evaluation with datasets, hence no dataset splits are provided.
Hardware Specification	No	This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for experiments.
Software Dependencies	No	This is a theoretical paper that does not involve experimental implementation, and therefore does not list any specific software dependencies with version numbers.
Experiment Setup	No	This is a theoretical paper that does not describe an experimental setup with hyperparameters or training configurations.