UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

Authors: Rong J.B. Zhu, Yanqi Qiu

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct two kinds of experiments. In Section 6.1, µk are random, where various settings are considered. In Section 6.2, µk are fixed. Note that our modeling assumptions are violated here. We show these experiments of fixed µk because they are benchmarks established by Audibert et al. (2010) and Karnin et al. (2013). Our baselines include the state-of-the-art Successive Rejects (SR) (Audibert et al., 2010), Sequential Halving (SH) (Karnin et al., 2013), the UCB-exploration (UCBE) (Audibert et al., 2010), and Top-Two Thompson sampling (TTTS) (Russo, 2020).
Researcher Affiliation Academia Rong J.B. Zhu EMAIL Institute of Science and Technology for Brain-inspired Intelligence, Fudan University Yanqi Qiu EMAIL School of Mathematics and Statistics, Wuhan University
Pseudocode Yes Algorithm 1 RUE for best-arm identification. 1: Initialization: Pull each arm twice 2: for t = 2K + 1, . . . , n do 3: Calculate Uk,t = ˆµk,t 1 + q 2τ 2 k,t 1 log n 4: Pull the arm with the highest Uk,t for k [K] 5: Collect the reward associated the chosen arm 6: end for 7: Return estimated best arm Jn = arg maxk [K] ˆµk,n
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a repository.
Open Datasets No For random µk, we have the following three setups: (R1) Gaussian rewards with mean µk and variance σ2 = 1, where µk U(0, 0.5) for k [K]. (R2) The same µk as R1, but the rewards are Bernoulli rewards with means µk. (R3) Bernoulli rewards with µk U(0, 0.5) for k [K].
Dataset Splits No For random µk, we have the following three setups: (R1) Gaussian rewards with mean µk and variance σ2 = 1, where µk U(0, 0.5) for k [K].
Hardware Specification No The paper does not contain any explicit details about the hardware used for running experiments.
Software Dependencies No The paper does not mention any specific software or library versions used for implementation.
Experiment Setup Yes The maximum budgets are set to N = 5000 for R1 and R2, and to N = 12000 for R3. We choose these values based on the median complexity terms in our experiments, which are H 2000 for R1 and R2, and H 5500 for R3. Then we study various budget settings n {N/4, N/2, N}, so that we can show how the performance varies when the budget is less than H and double of H. UCBE is implemented with parameter a = 2n/H, since this parameter works the best overall according to Audibert et al. (2010) and Karnin et al. (2013). In RUE, we plug in the estimators of the variances σ2 and σ2 0 as in Zhu & Kveton (2022a).