Optimal and Practical Batched Linear Bandit Algorithm

Authors: Sanghoon Yu, Min-Hwan Oh

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical evaluations demonstrate that BLAE consistently and substantially outperforms existing batched linear bandit algorithms, offering a combination of provable nearoptimality across all regimes and strong empirical performance. Overall, BLAE demonstrates that provably optimal worstcase regret can be achieved alongside low batch complexity and tractable computation key objectives in batched bandit research. By uniting strong theoretical guarantees with robust empirical performance, BLAE helps resolve the longstanding tension between provable efficiency and practical viability in the batched linear bandit setting.
Researcher Affiliation Academia 1Seoul National University, Seoul, Republic of Korea. Correspondence to: Min-hwan Oh <EMAIL>.
Pseudocode Yes Algorithm 1 BLAE
Open Source Code No The paper does not explicitly state that the authors' implementation of BLAE is open-source or provide a link to its repository. It mentions using "the official implementation released by Ren et al. (2024)" for a comparison, but not for their own algorithm.
Open Datasets No The K arms and θ are randomly sampled from a d-dimensional uniform distribution. The experiment is repeated 10 times, with the arms being independently resampled from the uniform distribution in each run. We conduct experiments with the following (K, d) parameter pairs (K, d) = (50, 5), (50, 10), (100, 5), (100, 10), (200, 5), (200, 10), (400, 5), (400, 10). In this section, we provide additional experimental results for the instances where arms are randomly sampled from a Gaussian distribution, following the same format as in Figure 1. Similar to the uniform distribution case, we repeat each experiment 10 times, independently resampling the arms from the Gaussian distribution in every run.
Dataset Splits No The paper describes generating synthetic data by sampling arms and parameters from uniform or Gaussian distributions for each experimental run and repeating experiments. It does not refer to a fixed dataset or its training/validation/test splits.
Hardware Specification No The paper mentions runtime comparisons but does not specify the hardware (e.g., CPU, GPU models, memory) used for its experiments. For instance, in Section E.1, it states, "The experiments were conducted with K = 50 arms and a feature dimension of d = 5... the BLAE algorithm requires an average of 0.73 seconds over 10 runs when T = 100,000" but without detailing the computing environment.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In this section, we empirically evaluate the performance of the BLAE algorithm. We measure the cumulative regret over T = 100,000 time steps. The K arms and θ are randomly sampled from a d-dimensional uniform distribution. The experiment is repeated 10 times, with the arms being independently resampled from the uniform distribution in each run. We conduct experiments with the following (K, d) parameter pairs (K, d) = (50, 5), (50, 10), (100, 5), (100, 10), (200, 5), (200, 10), (400, 5), (400, 10). We compare the performance of BLAE with the previously known state-of-the-art algorithms: Explore-Estimate Eliminate-Exploit (denoted by E4; Ren et al. 2024), Rarely Switching OFUL (denoted by RS-OFUL; Abbasi-Yadkori et al. 2011), and the phased elimination algorithm with D-optimal design (denoted by Pha Elim D; Esfandiari et al. 2021; Lattimore & Szepesvári 2020). For BLAE, we set the parameters as λ = 1 and cℓ= |Aℓ 1|/|A0| for ℓ 1. This choice allows cℓto adaptively reflect the level of uncertainty based on the number of remaining arms, ensuring both flexibility and practical applicability. For E4, we configure the parameters identically to the practical setup in Ren et al. (2024). For RS-OFUL, we set the switching parameter to C = 0.5. For Pha Elim D, we design the batch sizes as T 1 2 i instead of qi to achieve better performance.