A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

Authors: Evan Yifan Xu, Pan Xu

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions. In this section, we describe our experimental results on both BMAB-FC setting and BMAB-VC setting.
Researcher Affiliation Academia Evan Yifan Xu EMAIL Southeast University, No.2 , Sipailou, Nanjing, Jiangsu 210096 CN; Pan Xu EMAIL New Jersey Institute of Technology, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 USA
Pseudocode Yes Algorithm 1: A UCB-based Algorithm for BMAB-FC (UF).; Algorithm 2: A UCB-and-LCB-based Algorithm for BMAB-VC (ULV).
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. It proposes algorithms but does not offer their implementation as open-source.
Open Datasets No In the first experiment, we study the gap between the actual regrets and regret bounds by following the parameter settings in (Tran-Thanh et al., 2012a). We set the number of arms K = 100. We generate the fixed cost cj for each arm j by selecting a uniform value from [1, 10]. The reward distribution of each arm j is set to be truncated Gaussian, with mean reward vj takes a uniform value from [0, 1] and variance σ2 = vj/2.
Dataset Splits No The paper describes generating synthetic data for each experiment run and varying parameters like budget and density gaps. It does not specify any training, validation, or test splits for a pre-existing dataset. Instead, it states: For each instance, we run all algorithms for 100 times and take the average as the final performance.
Hardware Specification Yes All experiments are conducted on a PC with Quad-Core Intel Core i7 (2GHz) processor and 8GB memory.
Software Dependencies No The paper does not explicitly mention any specific software dependencies or library versions used for the experiments.
Experiment Setup Yes In the first experiment, we study the gap between the actual regrets and regret bounds by following the parameter settings in (Tran-Thanh et al., 2012a). We set the number of arms K = 100. We generate the fixed cost cj for each arm j by selecting a uniform value from [1, 10]. The reward distribution of each arm j is set to be truncated Gaussian, with mean reward vj takes a uniform value from [0, 1] and variance σ2 = vj/2. In addition, we vary the total budgets B from 103 to 106. In the second experiment, we investigate how the regret bounds respond to the gap of cost means, P j(cj c )+, while fixing the minimum density gap d and the gap of reward means, P j(v vj)+. We set K = 10 and the total budget B = 106. We introduce a parameter C and choose C {101, 102, 103, 104, 105}. To fix d = 0.5, we first set the mean reward vj for each arm j as 1, then set the cost for the first arm c1 = 1/(d + 1/C); set the cost for the rest arms to cj,j =1 = C. In the third experiment, we examine how the regret bounds respond to the parameter of the minimum density gap d, while fixing the other parameters. We set K = 10 and B = 100, and vary d {10 4, 10 3, 10 2, 10 1} by first fixing the cost means for all arms as follows: for arm j = 1, we set c1 = 1; for all arms j, j = 1, we set cj,j =1 = c1/(1 κ), where κ is set to 0.1 in our experiments. We set a uniform reward means for all arms as vj = (c1/κ) d. For each instance, we run all algorithms for 100 times and take the average as the final performance. The detailed setting is summarized in Table 2.