The Batch Complexity of Bandit Pure Exploration

Authors: Adrienne Tuynman, Rémy Degenne

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3.6. Experiments on the BAI setting Our algorithm PET is near-optimal in round and sample complexities for many pure exploration problems, and has theoretical guarantees for any pure exploration problem. To ascertain its practical performances, we compare it to baselines and state of the art algorithms for best arm identification and thresholding bandits. Each experiment is repeated over 1000 runs. All reward distributions are Gaussian with variance 1 and we use the confidence level δ = 0.05, which is chosen for its relevance to statistical practice. We compare
Researcher Affiliation Academia 1Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189CRISt AL, F-59000 Lille, France. Correspondence to: Adrienne Tuynman <EMAIL>, R emy Degenne <EMAIL>.
Pseudocode Yes Algorithm 1 Phased Explore then Track (PET)
Open Source Code No The paper does not provide any statement or link regarding the public availability of source code for the methodology described.
Open Datasets No For the BAI experiment, we run each algorithm on 10-arm instances where the best arm has mean 1, and each other arm i has mean uniformly sampled between 0.6 and 0.9.
Dataset Splits No The paper describes generating '10-arm instances where the best arm has mean 1, and each other arm i has mean uniformly sampled between 0.6 and 0.9' for experiments. This indicates synthetic data generation rather than the use of a fixed dataset with predefined splits.
Hardware Specification No The paper describes experimental results but does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not explicitly mention any specific software, libraries, or frameworks with their version numbers that were used to implement or run the experiments.
Experiment Setup Yes Each experiment is repeated over 1000 runs. All reward distributions are Gaussian with variance 1 and we use the confidence level δ = 0.05... Our algorithm PET, with T0 = 1; Opt-BBAI (Jin et al., 2023) with α = 1.05... For the BAI experiment, we run each algorithm on 10-arm instances where the best arm has mean 1, and each other arm i has mean uniformly sampled between 0.6 and 0.9.