An Information-Theoretic Analysis of Thompson Sampling
Authors: Daniel Russo, Benjamin Van Roy
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance. |
| Researcher Affiliation | Academia | Daniel Russo EMAIL Department of Management Science and Engineering Stanford University Stanford, California 94305 Benjamin Van Roy EMAIL Departments of Management Science and Engineering and Electrical Engineering Stanford University Stanford, California 94305 |
| Pseudocode | Yes | Algorithm 1 Linear Gaussian Thompson Sampling |
| Open Source Code | No | The paper does not provide concrete access to source code. It mentions "forthcoming work of ours leverages this to produce an algorithm that outperforms Thompson sampling," which refers to future work rather than code released for the present paper. |
| Open Datasets | No | The paper is a theoretical analysis of Thompson sampling and regret bounds. It discusses problem settings like "classical multi armed bandit problem" and "online linear optimization problems" but does not describe empirical experiments using specific datasets, nor does it provide access information for any datasets. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical analysis and regret bounds. It does not describe any empirical experiments or the use of specific datasets, therefore, there is no mention of training/test/validation dataset splits. |
| Hardware Specification | No | The paper is a theoretical analysis and does not describe any empirical experiments or computational implementations that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper focuses on theoretical analysis and algorithms, without discussing specific implementations or experimental setups. Therefore, it does not list any software dependencies with version numbers. |
| Experiment Setup | No | The paper presents a theoretical analysis and derives regret bounds. It does not describe any empirical experiments, and consequently, there are no details provided regarding experimental setup, hyperparameters, or training configurations. |