Continuous-in-time Limit for Bayesian Bandits

Authors: Yuhua Zhu, Zachary Izzo, Lexing Ying

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, titled 'Numerical experiments', the paper states: 'We compare the performance of the approximate Bayes-optimal policy (Algorithm 1) with Thompson sampling and UCB in terms of the expected regret.' It also presents 'Figure 4: The above plot shows the expected regret in the K-armed normal bandit problem for the approximate Bayes-optimal policy, Thompson sampling, and UCB.' This indicates empirical evaluation and data analysis.
Researcher Affiliation Academia All listed authors are affiliated with academic institutions: 'University of California San Diego' and 'Stanford University'. Their email addresses use the '.edu' domain, confirming academic affiliations.
Pseudocode Yes The paper includes 'Algorithm 1 Approximate Bayes-optimal policy' in Section 4.2, providing structured steps for their proposed method.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide links to any code repositories. The provided URL refers to the paper's attribution requirements.
Open Datasets No The paper discusses 'K-armed normal bandit problem' and 'linear bandits' within the context of numerical experiments. These are problem settings for simulations rather than specific, pre-existing datasets. No external or publicly available datasets are mentioned, nor are any links, DOIs, or citations provided for data access.
Dataset Splits No The paper describes simulation setups, such as 'The horizon is set to be n = 10^3' and 'The expected regret is averaged over 10^3 simulations.' Since the experiments are based on simulations rather than pre-existing datasets, the concept of explicit train/test/validation splits does not apply. No such splits are mentioned for any data used.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the numerical experiments, such as GPU models, CPU types, or memory specifications. It only mentions general aspects of computational cost.
Software Dependencies No The paper does not list any specific software dependencies with version numbers, such as programming languages, libraries, or solvers used for implementation or numerical solutions. It only describes the mathematical framework and algorithms.
Experiment Setup Yes Section 6 'Numerical experiments' provides specific setup details: 'The horizon is set to be n = 10^3.' and 'The initial prior measure for both the Bayes-optimal policy and Thompson sampling is νk N( 1 n, 1 n) for all k. This implies that the limiting HJB equation is (11) with ˆµk(s, q) = sk+1 qk+1 and ˆσ 1. In addition, δ = n^2 for the UCB algorithm.' It also states for linear bandits: 'We solve the limiting HJB equation by the numerical scheme (31)-(32) with δt = δq = 1 N , δs = 1 N and N = 100.'