Adversarial Bandits Against Arbitrary Strategies

Authors: Jung-hun Kim, Se-Young Yun

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study adversarial bandit problems against arbitrarily switching arms. Crucially, we allow the number of switches S to be unknown to the agent, thereby targeting arbitrary strategies without prior knowledge of their complexity. To address this setting, we adopt the master-base framework combined with the online mirror descent (OMD) method... We begin by analyzing a master-base algorithm that employs OMD with a negative entropy regularizer, and show that it achieves a regret bound of O(S1/2K1/3T 2/3)... This refinement leads to an improved regret bound of O(min{ SKTρ, S KT}) with respect to T, where ρ captures the variance associated with a comparator strategy.
Researcher Affiliation Academia Jung-hun Kim EMAIL CREST, ENSAE, IP Paris Fair Play joint team, France Se-Young Yun EMAIL KAIST AI, South Korea
Pseudocode Yes Algorithm 1 Master-base OMD... Algorithm 2 Master-base OMD with adaptive learning rates
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It only discusses implementation details in Remark 3.6.
Open Datasets No This is a theoretical paper. No specific datasets are used or referenced for empirical evaluation.
Dataset Splits No This is a theoretical paper and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not provide specific software dependencies with version numbers for experimental reproducibility. Remark 3.6 mentions general implementation aspects without specific versioned components used in experiments.
Experiment Setup No The paper is theoretical and does not present experimental results, therefore no experimental setup details or hyperparameters are provided.