reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continuous-in-time Limit for Bayesian Bandits

Authors: Yuhua Zhu, Zachary Izzo, Lexing Ying

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6, titled 'Numerical experiments', the paper states: 'We compare the performance of the approximate Bayes-optimal policy (Algorithm 1) with Thompson sampling and UCB in terms of the expected regret.' It also presents 'Figure 4: The above plot shows the expected regret in the K-armed normal bandit problem for the approximate Bayes-optimal policy, Thompson sampling, and UCB.' This indicates empirical evaluation and data analysis.
Researcher Affiliation	Academia	All listed authors are affiliated with academic institutions: 'University of California San Diego' and 'Stanford University'. Their email addresses use the '.edu' domain, confirming academic affiliations.
Pseudocode	Yes	The paper includes 'Algorithm 1 Approximate Bayes-optimal policy' in Section 4.2, providing structured steps for their proposed method.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide links to any code repositories. The provided URL refers to the paper's attribution requirements.
Open Datasets	No	The paper discusses 'K-armed normal bandit problem' and 'linear bandits' within the context of numerical experiments. These are problem settings for simulations rather than specific, pre-existing datasets. No external or publicly available datasets are mentioned, nor are any links, DOIs, or citations provided for data access.
Dataset Splits	No	The paper describes simulation setups, such as 'The horizon is set to be n = 10^3' and 'The expected regret is averaged over 10^3 simulations.' Since the experiments are based on simulations rather than pre-existing datasets, the concept of explicit train/test/validation splits does not apply. No such splits are mentioned for any data used.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the numerical experiments, such as GPU models, CPU types, or memory specifications. It only mentions general aspects of computational cost.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers, such as programming languages, libraries, or solvers used for implementation or numerical solutions. It only describes the mathematical framework and algorithms.
Experiment Setup	Yes	Section 6 'Numerical experiments' provides specific setup details: 'The horizon is set to be n = 10^3.' and 'The initial prior measure for both the Bayes-optimal policy and Thompson sampling is νk N( 1 n, 1 n) for all k. This implies that the limiting HJB equation is (11) with ˆµk(s, q) = sk+1 qk+1 and ˆσ 1. In addition, δ = n^2 for the UCB algorithm.' It also states for linear bandits: 'We solve the limiting HJB equation by the numerical scheme (31)-(32) with δt = δq = 1 N , δs = 1 N and N = 100.'