reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal and Practical Batched Linear Bandit Algorithm

Authors: Sanghoon Yu, Min-Hwan Oh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical evaluations demonstrate that BLAE consistently and substantially outperforms existing batched linear bandit algorithms, offering a combination of provable nearoptimality across all regimes and strong empirical performance. Overall, BLAE demonstrates that provably optimal worstcase regret can be achieved alongside low batch complexity and tractable computation key objectives in batched bandit research. By uniting strong theoretical guarantees with robust empirical performance, BLAE helps resolve the longstanding tension between provable efficiency and practical viability in the batched linear bandit setting.
Researcher Affiliation	Academia	1Seoul National University, Seoul, Republic of Korea. Correspondence to: Min-hwan Oh <EMAIL>.
Pseudocode	Yes	Algorithm 1 BLAE
Open Source Code	No	The paper does not explicitly state that the authors' implementation of BLAE is open-source or provide a link to its repository. It mentions using "the official implementation released by Ren et al. (2024)" for a comparison, but not for their own algorithm.
Open Datasets	No	The K arms and θ are randomly sampled from a d-dimensional uniform distribution. The experiment is repeated 10 times, with the arms being independently resampled from the uniform distribution in each run. We conduct experiments with the following (K, d) parameter pairs (K, d) = (50, 5), (50, 10), (100, 5), (100, 10), (200, 5), (200, 10), (400, 5), (400, 10). In this section, we provide additional experimental results for the instances where arms are randomly sampled from a Gaussian distribution, following the same format as in Figure 1. Similar to the uniform distribution case, we repeat each experiment 10 times, independently resampling the arms from the Gaussian distribution in every run.
Dataset Splits	No	The paper describes generating synthetic data by sampling arms and parameters from uniform or Gaussian distributions for each experimental run and repeating experiments. It does not refer to a fixed dataset or its training/validation/test splits.
Hardware Specification	No	The paper mentions runtime comparisons but does not specify the hardware (e.g., CPU, GPU models, memory) used for its experiments. For instance, in Section E.1, it states, "The experiments were conducted with K = 50 arms and a feature dimension of d = 5... the BLAE algorithm requires an average of 0.73 seconds over 10 runs when T = 100,000" but without detailing the computing environment.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In this section, we empirically evaluate the performance of the BLAE algorithm. We measure the cumulative regret over T = 100,000 time steps. The K arms and θ are randomly sampled from a d-dimensional uniform distribution. The experiment is repeated 10 times, with the arms being independently resampled from the uniform distribution in each run. We conduct experiments with the following (K, d) parameter pairs (K, d) = (50, 5), (50, 10), (100, 5), (100, 10), (200, 5), (200, 10), (400, 5), (400, 10). We compare the performance of BLAE with the previously known state-of-the-art algorithms: Explore-Estimate Eliminate-Exploit (denoted by E4; Ren et al. 2024), Rarely Switching OFUL (denoted by RS-OFUL; Abbasi-Yadkori et al. 2011), and the phased elimination algorithm with D-optimal design (denoted by Pha Elim D; Esfandiari et al. 2021; Lattimore & Szepesvári 2020). For BLAE, we set the parameters as λ = 1 and cℓ= \|Aℓ 1\|/\|A0\| for ℓ 1. This choice allows cℓto adaptively reflect the level of uncertainty based on the number of remaining arms, ensuring both flexibility and practical applicability. For E4, we configure the parameters identically to the practical setup in Ren et al. (2024). For RS-OFUL, we set the switching parameter to C = 0.5. For Pha Elim D, we design the batch sizes as T 1 2 i instead of qi to achieve better performance.