Bandit problems with fidelity rewards

Authors: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose two models for fidelity. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played. In the subscription model the additional reward depends on the current number of consecutive draws of the arm. We consider both stochastic and adversarial problems. Since single-arm strategies are not always optimal in stochastic problems, the notion of regret in the adversarial setting needs careful adjustment. We introduce three possible notions of regret and investigate which can be bounded sublinearly. We study in detail the special cases of increasing, decreasing and coupon (where the player gets an additional reward after every m plays of an arm) fidelity rewards. For the models which do not necessarily enjoy sublinear regret, we provide a worst case lower bound. For those models which exhibit sublinear regret, we provide algorithms and bound their regret. Keywords: multi-armed bandit problem, fidelity reward, regret minimization
Researcher Affiliation Collaboration G abor Lugosi EMAIL ICREA Barcelona, Spain, and Department of Economics and Business Pompeu Fabra University Barcelona, Spain, and Barcelona Graduate School of Economics Barcelona, Spain Ciara Pike-Burke EMAIL Department of Mathematics Imperial College London London, UK Pierre-Andr e Savalle EMAIL Cisco Systems, Inc. Paris, France
Pseudocode Yes Figure 1: EXP4 with pseudo rewards for the adversarial loyalty points bandits model with decreasing fidelity reward. ... Figure 2: Lazy strategy for adversarial bandits with decreasing fidelity rewards, subscription model
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No This is a theoretical paper that does not use or reference any specific datasets for experimental evaluation.
Dataset Splits No This is a theoretical paper that does not involve experimental evaluation with datasets, hence no dataset splits are provided.
Hardware Specification No This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for experiments.
Software Dependencies No This is a theoretical paper that does not involve experimental implementation, and therefore does not list any specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper that does not describe an experimental setup with hyperparameters or training configurations.