Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Authors: Zihan Zhang, Xiangyang Ji, Yuan Zhou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the optimal batch-regret tradeoff for batch linear contextual bandits. For this problem, we design batch learning algorithms and prove that they achieve the optimal regret bounds (up to logarithmic factors) for any batch number M, number of actions K, time horizon T, and dimension d. Therefore, we establish the full-parameter-range almost optimal batch-regret tradeoff for the batch linear contextual bandit problem. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest.
Researcher Affiliation Academia Paul G. Allen School of CSE, University of Washington. Email: EMAIL Department of Automation, Tsinghua University. Email: EMAIL Yau Mathematical Sciences Center, Tsinghua University and Beijing Institute of Mathematical Sciences and Applications. Email: EMAIL
Pseudocode Yes Algorithm 1 Batch Learning ... Algorithm 2 Exp Policy
Open Source Code No The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No The paper focuses on theoretical analysis of batch linear contextual bandits with stochastic contexts. It does not mention the use of any specific publicly available datasets for experimental evaluation, nor does it provide access information for any datasets.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments on specific datasets, therefore, it does not provide any information about dataset splits for reproducibility.
Hardware Specification No The paper presents theoretical contributions on algorithm design and regret analysis for batch linear contextual bandits. It does not describe any experimental evaluations that would require hardware specifications.
Software Dependencies No This is a theoretical paper focusing on mathematical proofs and algorithm design. It does not describe any empirical implementation or experiments, and therefore does not list any software dependencies with version numbers.
Experiment Setup No The paper is theoretical in nature, presenting algorithm designs and mathematical proofs. It does not include an experimental setup section with specific hyperparameters or training configurations.