reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

Authors: Kapilan Balagopalan, Ngo Tuan Nguyen, Yao Zhao, Kwang-Sung Jun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	D. Empirical studies In this section we present the study of stopping time distribution of Successive Elimination (SE) (Even-Dar et al., 2006) algorithm. Experimental setup: We implemented SE in two different configurations... In Figure 5, 6 we have plotted the histograms of observed stopping times for all the trials.
Researcher Affiliation	Academia	Kapilan Balagopalan * 1 Tuan Ngo Nguyen * 1 Yao Zhao * 1 Kwang-Sung Jun 1... 1University of Arizona. Correspondence to: Kwang-Sung Jun <EMAIL>.
Pseudocode	Yes	Algorithm 1 FC-DSH Input: A set of K arms, δ... Algorithm 2 Brake Booster Input: base trial count L1, base budget T1, algorithm A, base failure rate δ0... Algorithm 3 Budgeted Identification Input: algorithm A, the number of trials L, sampling budget per trial T, base failure rate δ0
Open Source Code	No	No explicit statement or link for open-source code for the methodology described in this paper is found.
Open Datasets	No	Experimental setup: We implemented SE in two different configurations, 1) Original version 2) Version with ε-slack added to the stopping condition. We set the number of arms 3, with mean rewards {1.0, 0.9, 0.9}. Noise follows N(0, 1).
Dataset Splits	No	Experimental setup: We implemented SE in two different configurations... We set the number of arms 3, with mean rewards {1.0, 0.9, 0.9}. Noise follows N(0, 1). We conduct 1000 trials and observe the stopping times of those trials. This describes simulation parameters and number of runs, not dataset splits.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory) are provided for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned. The paper only states 'We implemented SE'.
Experiment Setup	Yes	Experimental setup: We implemented SE in two different configurations... We set the number of arms 3, with mean rewards {1.0, 0.9, 0.9}. Noise follows N(0, 1). We set δ = 0.01. We conduct 1000 trials and observe the stopping times of those trials. We forcefully terminated the trials that do not stop until 30,000 time steps or 1,000,000 time steps. ... We consider a bandit problem with four arms, each associated with mean rewards of {1.0, 0.6, 0.6, 0.6}. The reward noise is drawn from a normal distribution N(0, 1). The confidence parameter is set to δ = 0.01. We perform 1000 independent trials and record the stopping times for each. Trials that do not terminate within 1,000,000 time steps are forcefully stopped at that point. ... In these experiments, we employ a 1.2 growth factor for both the per-trial budget and the number of trials, in contrast to the conventional doubling scheme. ... For the implementation of FC-DSH algorithm, we deviate from the theoretical version presented in the paper by reusing samples across rounds. ... we generalize the growth schedule by introducing a scaling parameter b, such that Tm = bm 1T1, for phase m 2. While our analysis focuses on the case b = 2, we speculate that the exponential guarantee may hold for any value of b. In our experiments, we set b = 1.01 to allow for finer budget increment. Additionally, we adopt the stopping rule proposed by (Jourdan et al., 2022) across all algorithms to ensure fair comparison.