Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
Authors: Zihan Zhang, Xiangyang Ji, Yuan Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the optimal batch-regret tradeoff for batch linear contextual bandits. For this problem, we design batch learning algorithms and prove that they achieve the optimal regret bounds (up to logarithmic factors) for any batch number M, number of actions K, time horizon T, and dimension d. Therefore, we establish the full-parameter-range almost optimal batch-regret tradeoff for the batch linear contextual bandit problem. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest. |
| Researcher Affiliation | Academia | Paul G. Allen School of CSE, University of Washington. Email: EMAIL Department of Automation, Tsinghua University. Email: EMAIL Yau Mathematical Sciences Center, Tsinghua University and Beijing Institute of Mathematical Sciences and Applications. Email: EMAIL |
| Pseudocode | Yes | Algorithm 1 Batch Learning ... Algorithm 2 Exp Policy |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper focuses on theoretical analysis of batch linear contextual bandits with stochastic contexts. It does not mention the use of any specific publicly available datasets for experimental evaluation, nor does it provide access information for any datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments on specific datasets, therefore, it does not provide any information about dataset splits for reproducibility. |
| Hardware Specification | No | The paper presents theoretical contributions on algorithm design and regret analysis for batch linear contextual bandits. It does not describe any experimental evaluations that would require hardware specifications. |
| Software Dependencies | No | This is a theoretical paper focusing on mathematical proofs and algorithm design. It does not describe any empirical implementation or experiments, and therefore does not list any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical in nature, presenting algorithm designs and mathematical proofs. It does not include an experimental setup section with specific hyperparameters or training configurations. |