reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonstochastic Bandits with Composite Anonymous Feedback

Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our ﬁrst contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order p (d + 1)KT, where d is the maximum delay, K is the number of arms, and T is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.
Researcher Affiliation	Collaboration	Nicolò Cesa-Bianchi EMAIL Dept. of Computer Science and DSRC, Università degli Studi di Milano, Italy Tommaso Cesari EMAIL Institut de Mathematiques de Toulouse (IMT), Paul Sabatier University (UT3), Toulouse, France Toulouse School of Economics (TSE), Toulouse, France Roberto Colomboni EMAIL Istituto Italiano di Tecnologia (IIT), Genova, Italy Università degli Studi di Milano, Italy Claudio Gentile EMAIL Google Research, New York, USA Yishay Mansour EMAIL Tel Aviv University, Tel Aviv, Israel Google research, Tel Aviv, Israel
Pseudocode	Yes	Algorithm 1: Co Lo Wrapper (Composite Loss Wrapper) Algorithm 2: Follow The Regularized Leader (FTRL) with (1/2)-Tsallis entropy
Open Source Code	No	The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release in supplementary materials. The paper is theoretical.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and proofs within a bandit framework. It does not conduct experiments using specific datasets, thus no dataset access information is provided.
Dataset Splits	No	The paper does not conduct empirical studies with datasets, so there is no mention of dataset splits (e.g., training, test, validation splits) for reproducibility.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not mention any specific software dependencies or versions required to implement or run experiments.
Experiment Setup	No	The paper is theoretical and presents algorithms and their theoretical guarantees (regret bounds, stability). It does not describe any specific experimental setup, hyperparameters, or training configurations.