p-Mean Regret for Stochastic Bandits
Authors: Anand Krishna, Philips George John, Adarsh Barik, Vincent Y. F. Tan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have also performed some synthetic experiments to compare the p-mean-regrets achieved by EXPLORE-THEN-UCB versus NCB and UCB1 as baselines; these results can be found in the extended version (Krishna et al. 2024). |
| Researcher Affiliation | Academia | 1 Department of Electrical and Computer Engineering, NUS 2 CNRS-CREATE & Department of Computer Science, NUS 3 Institute of Data Science, NUS 4 Department of Mathematics, NUS |
| Pseudocode | Yes | Algorithm 1 The EXPLORE-THEN-UCB Parameters: Time horizon T, number of arms k, exploration period T. for t 1, . . . , T do Uniformly sample it from [k]. Pull arm it and observe the reward Xt. Increment nit,t by one and update bµit,t for t T + 1, . . . , T do Let UCBi,t 1 bµi,t 1 + 4 q log T ni,t 1 . Select it arg maxi [k] UCBi,t 1. Pull arm it and observe the reward Xt. Update nit,t and bµit,t. |
| Open Source Code | Yes | Code https://github.com/philips-george/p-mean-regret-stochastic-bandits |
| Open Datasets | No | The paper does not explicitly refer to any named datasets or provide access information for a publicly available or open dataset. It discusses the stochastic Multi-Armed Bandit problem framework, which uses simulated or generated rewards, but no specific dataset is identified. |
| Dataset Splits | No | The paper does not use or refer to any specific publicly available dataset, therefore, no dataset split information is provided. |
| Hardware Specification | No | The paper mentions 'synthetic experiments' in the future work section, but does not provide any specific details about the hardware used for these experiments. |
| Software Dependencies | No | The paper describes a new algorithm and theoretical results but does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper describes the parameters of its proposed algorithm (e.g., 'Time horizon T, number of arms k, exploration period T') but does not provide specific experimental setup details such as hyperparameters, learning rates, or batch sizes for actual experiments mentioned in the future work section. |