Continuous-in-time Limit for Bayesian Bandits
Authors: Yuhua Zhu, Zachary Izzo, Lexing Ying
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, titled 'Numerical experiments', the paper states: 'We compare the performance of the approximate Bayes-optimal policy (Algorithm 1) with Thompson sampling and UCB in terms of the expected regret.' It also presents 'Figure 4: The above plot shows the expected regret in the K-armed normal bandit problem for the approximate Bayes-optimal policy, Thompson sampling, and UCB.' This indicates empirical evaluation and data analysis. |
| Researcher Affiliation | Academia | All listed authors are affiliated with academic institutions: 'University of California San Diego' and 'Stanford University'. Their email addresses use the '.edu' domain, confirming academic affiliations. |
| Pseudocode | Yes | The paper includes 'Algorithm 1 Approximate Bayes-optimal policy' in Section 4.2, providing structured steps for their proposed method. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide links to any code repositories. The provided URL refers to the paper's attribution requirements. |
| Open Datasets | No | The paper discusses 'K-armed normal bandit problem' and 'linear bandits' within the context of numerical experiments. These are problem settings for simulations rather than specific, pre-existing datasets. No external or publicly available datasets are mentioned, nor are any links, DOIs, or citations provided for data access. |
| Dataset Splits | No | The paper describes simulation setups, such as 'The horizon is set to be n = 10^3' and 'The expected regret is averaged over 10^3 simulations.' Since the experiments are based on simulations rather than pre-existing datasets, the concept of explicit train/test/validation splits does not apply. No such splits are mentioned for any data used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the numerical experiments, such as GPU models, CPU types, or memory specifications. It only mentions general aspects of computational cost. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers, such as programming languages, libraries, or solvers used for implementation or numerical solutions. It only describes the mathematical framework and algorithms. |
| Experiment Setup | Yes | Section 6 'Numerical experiments' provides specific setup details: 'The horizon is set to be n = 10^3.' and 'The initial prior measure for both the Bayes-optimal policy and Thompson sampling is νk N( 1 n, 1 n) for all k. This implies that the limiting HJB equation is (11) with ˆµk(s, q) = sk+1 qk+1 and ˆσ 1. In addition, δ = n^2 for the UCB algorithm.' It also states for linear bandits: 'We solve the limiting HJB equation by the numerical scheme (31)-(32) with δt = δq = 1 N , δs = 1 N and N = 100.' |