reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

Authors: Rong J.B. Zhu, Yanqi Qiu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct two kinds of experiments. In Section 6.1, µk are random, where various settings are considered. In Section 6.2, µk are ﬁxed. Note that our modeling assumptions are violated here. We show these experiments of ﬁxed µk because they are benchmarks established by Audibert et al. (2010) and Karnin et al. (2013). Our baselines include the state-of-the-art Successive Rejects (SR) (Audibert et al., 2010), Sequential Halving (SH) (Karnin et al., 2013), the UCB-exploration (UCBE) (Audibert et al., 2010), and Top-Two Thompson sampling (TTTS) (Russo, 2020).
Researcher Affiliation	Academia	Rong J.B. Zhu EMAIL Institute of Science and Technology for Brain-inspired Intelligence, Fudan University Yanqi Qiu EMAIL School of Mathematics and Statistics, Wuhan University
Pseudocode	Yes	Algorithm 1 RUE for best-arm identiﬁcation. 1: Initialization: Pull each arm twice 2: for t = 2K + 1, . . . , n do 3: Calculate Uk,t = ˆµk,t 1 + q 2τ 2 k,t 1 log n 4: Pull the arm with the highest Uk,t for k [K] 5: Collect the reward associated the chosen arm 6: end for 7: Return estimated best arm Jn = arg maxk [K] ˆµk,n
Open Source Code	No	The paper does not contain any explicit statement about providing source code or a link to a repository.
Open Datasets	No	For random µk, we have the following three setups: (R1) Gaussian rewards with mean µk and variance σ2 = 1, where µk U(0, 0.5) for k [K]. (R2) The same µk as R1, but the rewards are Bernoulli rewards with means µk. (R3) Bernoulli rewards with µk U(0, 0.5) for k [K].
Dataset Splits	No	For random µk, we have the following three setups: (R1) Gaussian rewards with mean µk and variance σ2 = 1, where µk U(0, 0.5) for k [K].
Hardware Specification	No	The paper does not contain any explicit details about the hardware used for running experiments.
Software Dependencies	No	The paper does not mention any specific software or library versions used for implementation.
Experiment Setup	Yes	The maximum budgets are set to N = 5000 for R1 and R2, and to N = 12000 for R3. We choose these values based on the median complexity terms in our experiments, which are H 2000 for R1 and R2, and H 5500 for R3. Then we study various budget settings n {N/4, N/2, N}, so that we can show how the performance varies when the budget is less than H and double of H. UCBE is implemented with parameter a = 2n/H, since this parameter works the best overall according to Audibert et al. (2010) and Karnin et al. (2013). In RUE, we plug in the estimators of the variances σ2 and σ2 0 as in Zhu & Kveton (2022a).