Nonstochastic Bandits with Composite Anonymous Feedback

Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order p (d + 1)KT, where d is the maximum delay, K is the number of arms, and T is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.
Researcher Affiliation Collaboration Nicolò Cesa-Bianchi EMAIL Dept. of Computer Science and DSRC, Università degli Studi di Milano, Italy Tommaso Cesari EMAIL Institut de Mathematiques de Toulouse (IMT), Paul Sabatier University (UT3), Toulouse, France Toulouse School of Economics (TSE), Toulouse, France Roberto Colomboni EMAIL Istituto Italiano di Tecnologia (IIT), Genova, Italy Università degli Studi di Milano, Italy Claudio Gentile EMAIL Google Research, New York, USA Yishay Mansour EMAIL Tel Aviv University, Tel Aviv, Israel Google research, Tel Aviv, Israel
Pseudocode Yes Algorithm 1: Co Lo Wrapper (Composite Loss Wrapper) Algorithm 2: Follow The Regularized Leader (FTRL) with (1/2)-Tsallis entropy
Open Source Code No The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release in supplementary materials. The paper is theoretical.
Open Datasets No The paper is theoretical and focuses on algorithm design and proofs within a bandit framework. It does not conduct experiments using specific datasets, thus no dataset access information is provided.
Dataset Splits No The paper does not conduct empirical studies with datasets, so there is no mention of dataset splits (e.g., training, test, validation splits) for reproducibility.
Hardware Specification No The paper is theoretical and does not describe any experimental setup that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not mention any specific software dependencies or versions required to implement or run experiments.
Experiment Setup No The paper is theoretical and presents algorithms and their theoretical guarantees (regret bounds, stability). It does not describe any specific experimental setup, hyperparameters, or training configurations.