Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
On Efficiency in Hierarchical Reinforcement Learning
Authors: Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the benefits of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efficient HRL methods. Specifically, we formalize the intuition that HRL can exploit well repeating "sub MDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efficient, as established through a finite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efficient. In this paper, we present two general results which highlight the types of problems in which HRL is expected to provide benefits, in terms of planning speed, as well as in terms of statistical efficiency. |
| Researcher Affiliation | Industry | Zheng Wen Deep Mind EMAIL Doina Precup Deep Mind EMAIL Morteza Ibrahimi Deep Mind EMAIL Andre Barreto Deep Mind EMAIL Benjamin Van Roy Deep Mind EMAIL Satinder Singh Deep Mind EMAIL |
| Pseudocode | Yes | Algorithm 1: PSRL with a Planner, Sampler, and Inferer; Algorithm 2: Planning with Exit Profiles (PEP) |
| Open Source Code | No | The paper is a theoretical investigation and does not mention providing open-source code for the described methodology. It refers to 'Behaviour suite for reinforcement learning' by other authors but does not state that its own code is available. |
| Open Datasets | No | This is a theoretical paper and does not describe experiments using datasets. Therefore, it does not provide information about publicly available datasets or access to them. |
| Dataset Splits | No | This is a theoretical paper and does not describe experiments. Thus, it does not provide information on training/test/validation dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not describe experiments. Therefore, it does not provide hardware specifications. |
| Software Dependencies | No | This is a theoretical paper and does not describe experiments. Therefore, it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe empirical experiments. Therefore, it does not provide details about an experimental setup, hyperparameters, or training settings. |