Delay as Payoff in MAB
Authors: Ofir Schlisselberg, Ido Cohen, Tal Lancewicki, Yishay Mansour
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we accompany our theoretical results with an empirical evaluation. We conducted synthetic experiments for both the cost and reward settings, using the algorithms in Table 1 as baselines. We show results on two representative distributions: Truncated Normal (bounded in [0, D]) and Bernoulli. ... Figure 1 shows the average cumulative regret over 10 runs. |
| Researcher Affiliation | Collaboration | 1Tel Aviv University 2Google Research EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Protocol1 ... Algorithm 2 Cost Successive Elimination (CSE) ... Algorithm 3 Bounded Doubling Successive Elimination ... Algorithm 4 Reward Successive Elimination ... Algorithm 5 Bounded Halving Successive Elimination (BHSE) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | We conducted synthetic experiments for both the cost and reward settings... For the truncated Normal we sample K means and standard deviations (std)... For the Bernoulli distribution, we sample K probabilities pi uniformly in [0, 1]... |
| Dataset Splits | No | The paper states parameters for synthetic data generation (e.g., T=150,000, K=30, D=5000) but does not provide specific training/test/validation dataset splits, cross-validation, or other data partitioning details. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | All experiments use T=150, 000, K=30 and D=5000. For the truncated Normal we sample K means and standard deviations (std), and adjust them to get a truncated version... For the Bernoulli distribution, we sample K probabilities pi uniformly in [0, 1]... Figure 1 shows the average cumulative regret over 10 runs. |