reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Authors: Zihan Zhang, Xiangyang Ji, Yuan Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the optimal batch-regret tradeoff for batch linear contextual bandits. For this problem, we design batch learning algorithms and prove that they achieve the optimal regret bounds (up to logarithmic factors) for any batch number M, number of actions K, time horizon T, and dimension d. Therefore, we establish the full-parameter-range almost optimal batch-regret tradeoff for the batch linear contextual bandit problem. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the ﬁrst of its kind in literature and maybe of independent interest.
Researcher Affiliation	Academia	Paul G. Allen School of CSE, University of Washington. Email: EMAIL Department of Automation, Tsinghua University. Email: EMAIL Yau Mathematical Sciences Center, Tsinghua University and Beijing Institute of Mathematical Sciences and Applications. Email: EMAIL
Pseudocode	Yes	Algorithm 1 Batch Learning ... Algorithm 2 Exp Policy
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	No	The paper focuses on theoretical analysis of batch linear contextual bandits with stochastic contexts. It does not mention the use of any specific publicly available datasets for experimental evaluation, nor does it provide access information for any datasets.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments on specific datasets, therefore, it does not provide any information about dataset splits for reproducibility.
Hardware Specification	No	The paper presents theoretical contributions on algorithm design and regret analysis for batch linear contextual bandits. It does not describe any experimental evaluations that would require hardware specifications.
Software Dependencies	No	This is a theoretical paper focusing on mathematical proofs and algorithm design. It does not describe any empirical implementation or experiments, and therefore does not list any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical in nature, presenting algorithm designs and mathematical proofs. It does not include an experimental setup section with specific hyperparameters or training configurations.