reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

Authors: Michael Chen, A. Pavan, N. V. Vinodchandran, Ruosong Wang, Lin Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper investigates list replicability Dixon et al. (2023) in the context of multiarmed (also linear) bandits (MAB). We define an algorithm A for MAB to be (ℓ, δ)-list replicable if with probability at least 1 δ, A has at most ℓtraces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For k-armed bandits, although the total number of traces can be Ω(k T ) for a time horizon T, we present several surprising upper bounds that either independent of or logarithmic of T: (1) a (2k, δ)list replicable algorithm with near-optimal regret, e O(k T)1, (2) a (O(k/δ), δ)-list replicable algorithm with regret e O k δk T , (3) a ((k + 1)B 1, δ)-list replicable algorithm with regret e O(k 3 2 T 1 2 +2 (B+1)) for any integer B > 1. On the other hand, for the sublinear regret regime, we establish a matching lower bound on the list complexity (parameter ℓ). We prove that there is no (k 1, δ)-list replicable algorithm with o(T)-regret.
Researcher Affiliation	Academia	Michael Chen Iowa State University EMAIL A. Pavan Iowa State University EMAIL N. V. Vinodchandran University of Nebraska Lincoln EMAIL Ruosong Wang Peking University EMAIL Lin F. Yang University of California, Los Angeles EMAIL
Pseudocode	Yes	Algorithm 1: (2k, δ)-List Replicable for MAB Algorithm 2: (12k/δ, δ)-List Replicable for MAB Algorithm 3: ((k + 1)B 1, δ)-List Replicable for MAB Algorithm 4: ((2d + 1)B 1, δ)-List Replicable Linear Bandit Algorithm
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper analyzes theoretical Multi-Armed Bandit (MAB) instances and linear bandit settings. It describes the data generation process for these theoretical models (e.g., "rewards rt is an i.i.d. sample drawn from Dat") but does not mention or use any specific publicly available real-world datasets for empirical evaluation. Therefore, there is no concrete access information provided for an open dataset.
Dataset Splits	No	The paper focuses on theoretical analysis and algorithm design for bandit problems. It does not conduct experiments on empirical datasets, and thus, no training/test/validation dataset splits are discussed or provided.
Hardware Specification	No	This paper presents theoretical results, including algorithm design, regret analysis, and lower bounds, for replicable bandit learning. It does not involve experimental evaluations requiring specific hardware, so no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical, focusing on algorithm design and mathematical proofs for bandit problems. It does not describe any implementation details or experiments that would require specific software dependencies or versions.
Experiment Setup	No	The paper presents theoretical algorithms and their mathematical analysis (e.g., regret bounds and list replicability). It does not describe any empirical experiments, and therefore, does not provide details about hyperparameters, training configurations, or other system-level experimental setup settings.