Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Is Q-Learning Provably Efficient?
Authors: Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I. Jordan
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret O(H3SAT), where S and A are the numbers of states and actions, H is the number of steps per episode, and T is the total number of steps. This sample efficiency matches the optimal regret that can be achieved by any model-based approach, up to a single H factor. To the best of our knowledge, this is the first analysis in the model-free setting that establishes T regret without requiring access to a simulator. |
| Researcher Affiliation | Collaboration | Chi Jin University of California, Berkeley EMAIL Zeyuan Allen-Zhu Microsoft Research, Redmond EMAIL Sebastien Bubeck Microsoft Research, Redmond EMAIL Michael I. Jordan University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Q-learning with UCB-Hoeffding |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | This paper is theoretical and focuses on mathematical proofs and analysis of Q-learning, not empirical evaluation using specific datasets. Therefore, no dataset information is provided. |
| Dataset Splits | No | This paper is theoretical and does not involve empirical experiments with dataset splits. |
| Hardware Specification | No | This paper is theoretical and does not report on experiments requiring hardware specifications. |
| Software Dependencies | No | This paper is theoretical and does not report on experiments requiring specific software dependencies and their versions. |
| Experiment Setup | No | This paper is theoretical and does not involve empirical experiments with specific setup details or hyperparameters. |