Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds
Authors: Michael Chen, A. Pavan, N. V. Vinodchandran, Ruosong Wang, Lin Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper investigates list replicability Dixon et al. (2023) in the context of multiarmed (also linear) bandits (MAB). We define an algorithm A for MAB to be (ℓ, δ)-list replicable if with probability at least 1 δ, A has at most ℓtraces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For k-armed bandits, although the total number of traces can be Ω(k T ) for a time horizon T, we present several surprising upper bounds that either independent of or logarithmic of T: (1) a (2k, δ)list replicable algorithm with near-optimal regret, e O(k T)1, (2) a (O(k/δ), δ)-list replicable algorithm with regret e O k δk T , (3) a ((k + 1)B 1, δ)-list replicable algorithm with regret e O(k 3 2 T 1 2 +2 (B+1)) for any integer B > 1. On the other hand, for the sublinear regret regime, we establish a matching lower bound on the list complexity (parameter ℓ). We prove that there is no (k 1, δ)-list replicable algorithm with o(T)-regret. |
| Researcher Affiliation | Academia | Michael Chen Iowa State University EMAIL A. Pavan Iowa State University EMAIL N. V. Vinodchandran University of Nebraska Lincoln EMAIL Ruosong Wang Peking University EMAIL Lin F. Yang University of California, Los Angeles EMAIL |
| Pseudocode | Yes | Algorithm 1: (2k, δ)-List Replicable for MAB Algorithm 2: (12k/δ, δ)-List Replicable for MAB Algorithm 3: ((k + 1)B 1, δ)-List Replicable for MAB Algorithm 4: ((2d + 1)B 1, δ)-List Replicable Linear Bandit Algorithm |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper analyzes theoretical Multi-Armed Bandit (MAB) instances and linear bandit settings. It describes the data generation process for these theoretical models (e.g., "rewards rt is an i.i.d. sample drawn from Dat") but does not mention or use any specific publicly available real-world datasets for empirical evaluation. Therefore, there is no concrete access information provided for an open dataset. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design for bandit problems. It does not conduct experiments on empirical datasets, and thus, no training/test/validation dataset splits are discussed or provided. |
| Hardware Specification | No | This paper presents theoretical results, including algorithm design, regret analysis, and lower bounds, for replicable bandit learning. It does not involve experimental evaluations requiring specific hardware, so no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical, focusing on algorithm design and mathematical proofs for bandit problems. It does not describe any implementation details or experiments that would require specific software dependencies or versions. |
| Experiment Setup | No | The paper presents theoretical algorithms and their mathematical analysis (e.g., regret bounds and list replicability). It does not describe any empirical experiments, and therefore, does not provide details about hyperparameters, training configurations, or other system-level experimental setup settings. |