Bandit problems with fidelity rewards
Authors: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose two models for fidelity. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played. In the subscription model the additional reward depends on the current number of consecutive draws of the arm. We consider both stochastic and adversarial problems. Since single-arm strategies are not always optimal in stochastic problems, the notion of regret in the adversarial setting needs careful adjustment. We introduce three possible notions of regret and investigate which can be bounded sublinearly. We study in detail the special cases of increasing, decreasing and coupon (where the player gets an additional reward after every m plays of an arm) fidelity rewards. For the models which do not necessarily enjoy sublinear regret, we provide a worst case lower bound. For those models which exhibit sublinear regret, we provide algorithms and bound their regret. Keywords: multi-armed bandit problem, fidelity reward, regret minimization |
| Researcher Affiliation | Collaboration | G abor Lugosi EMAIL ICREA Barcelona, Spain, and Department of Economics and Business Pompeu Fabra University Barcelona, Spain, and Barcelona Graduate School of Economics Barcelona, Spain Ciara Pike-Burke EMAIL Department of Mathematics Imperial College London London, UK Pierre-Andr e Savalle EMAIL Cisco Systems, Inc. Paris, France |
| Pseudocode | Yes | Figure 1: EXP4 with pseudo rewards for the adversarial loyalty points bandits model with decreasing fidelity reward. ... Figure 2: Lazy strategy for adversarial bandits with decreasing fidelity rewards, subscription model |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | This is a theoretical paper that does not use or reference any specific datasets for experimental evaluation. |
| Dataset Splits | No | This is a theoretical paper that does not involve experimental evaluation with datasets, hence no dataset splits are provided. |
| Hardware Specification | No | This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for experiments. |
| Software Dependencies | No | This is a theoretical paper that does not involve experimental implementation, and therefore does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper that does not describe an experimental setup with hyperparameters or training configurations. |