A New Regret-analysis Framework for Budgeted Multi-Armed Bandits
Authors: Evan Yifan Xu, Pan Xu
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions. In this section, we describe our experimental results on both BMAB-FC setting and BMAB-VC setting. |
| Researcher Affiliation | Academia | Evan Yifan Xu EMAIL Southeast University, No.2 , Sipailou, Nanjing, Jiangsu 210096 CN; Pan Xu EMAIL New Jersey Institute of Technology, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 USA |
| Pseudocode | Yes | Algorithm 1: A UCB-based Algorithm for BMAB-FC (UF).; Algorithm 2: A UCB-and-LCB-based Algorithm for BMAB-VC (ULV). |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. It proposes algorithms but does not offer their implementation as open-source. |
| Open Datasets | No | In the first experiment, we study the gap between the actual regrets and regret bounds by following the parameter settings in (Tran-Thanh et al., 2012a). We set the number of arms K = 100. We generate the fixed cost cj for each arm j by selecting a uniform value from [1, 10]. The reward distribution of each arm j is set to be truncated Gaussian, with mean reward vj takes a uniform value from [0, 1] and variance σ2 = vj/2. |
| Dataset Splits | No | The paper describes generating synthetic data for each experiment run and varying parameters like budget and density gaps. It does not specify any training, validation, or test splits for a pre-existing dataset. Instead, it states: For each instance, we run all algorithms for 100 times and take the average as the final performance. |
| Hardware Specification | Yes | All experiments are conducted on a PC with Quad-Core Intel Core i7 (2GHz) processor and 8GB memory. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies or library versions used for the experiments. |
| Experiment Setup | Yes | In the first experiment, we study the gap between the actual regrets and regret bounds by following the parameter settings in (Tran-Thanh et al., 2012a). We set the number of arms K = 100. We generate the fixed cost cj for each arm j by selecting a uniform value from [1, 10]. The reward distribution of each arm j is set to be truncated Gaussian, with mean reward vj takes a uniform value from [0, 1] and variance σ2 = vj/2. In addition, we vary the total budgets B from 103 to 106. In the second experiment, we investigate how the regret bounds respond to the gap of cost means, P j(cj c )+, while fixing the minimum density gap d and the gap of reward means, P j(v vj)+. We set K = 10 and the total budget B = 106. We introduce a parameter C and choose C {101, 102, 103, 104, 105}. To fix d = 0.5, we first set the mean reward vj for each arm j as 1, then set the cost for the first arm c1 = 1/(d + 1/C); set the cost for the rest arms to cj,j =1 = C. In the third experiment, we examine how the regret bounds respond to the parameter of the minimum density gap d, while fixing the other parameters. We set K = 10 and B = 100, and vary d {10 4, 10 3, 10 2, 10 1} by first fixing the cost means for all arms as follows: for arm j = 1, we set c1 = 1; for all arms j, j = 1, we set cj,j =1 = c1/(1 κ), where κ is set to 0.1 in our experiments. We set a uniform reward means for all arms as vj = (c1/κ) d. For each instance, we run all algorithms for 100 times and take the average as the final performance. The detailed setting is summarized in Table 2. |