Adversarial Combinatorial Semi-bandits with Graph Feedback

Authors: Yuxiao Wen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish that the optimal regret over a time horizon T scales as eΘ(SαST), where S is the size of the combinatorial decisions and α is the independence number of G. This result interpolates between the known regrets eΘ(ST) under full information (i.e., G is complete) and eΘ(KST) under the semi-bandit feedback (i.e., G has only self-loops), where K is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations. We also show that online stochastic mirror descent (OSMD) that only realizes convexified actions in expectation is suboptimal. In addition, we describe the problem of combinatorial semi-bandits with general capacity and apply our results to derive an improved regret upper bound, which may be of independent interest.
Researcher Affiliation Academia 1Courant Institute of Mathematical Sciences, New York University, New York, USA. Correspondence to: Yuxiao Wen <EMAIL>.
Pseudocode Yes Algorithm 1 Online Stochastic Mirror Descent under Graph Feedback (OSMD-G) Algorithm 2 Randomized Swap Rounding Algorithm 3 Combinatorial Arm Elimination
Open Source Code No The paper does not contain any explicit statements about the release of source code, links to code repositories, or mention of code in supplementary materials.
Open Datasets No The paper is theoretical and focuses on developing algorithms and proving regret bounds. It does not conduct empirical studies using specific datasets, and therefore, does not provide concrete access information for any open datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets. Consequently, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification No This paper is theoretical, presenting algorithms and proving regret bounds. It does not include any experimental results or describe a computational setup, and thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical, focusing on algorithms and proofs. It does not describe any computational experiments or implementations, and therefore does not list any specific software dependencies with version numbers.
Experiment Setup No This paper is theoretical and presents algorithms with their regret bounds. It does not include any experimental validation or implementation details, and as such, no experimental setup details or hyperparameters are provided.