Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-MDP
Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we provide an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure. Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself. Finally, we relax the assumption of having pre-trained options to show how, in particular situations, is still preferable a hierarchical approach over a standard one. Contributions The paper s contributions can be summarized as follows. (1) We propose a new algorithm for the finite-horizon setting that exploits a set of fixed options (Sutton et al., 1999) to solve the HRL problem (Section 4). (2) We conducted a regret analysis of this algorithm, providing an option-dependent upper bound, which, to the best of our knowledge, is the first in the Finite Horizon setting. This result could be used to define new objective functions for options discovery methods to search for options minimizing this regret (Section 4.1). (3) For the sake of our analysis, we formulate the notion of Finite Horizon SMDP and a performance difference lemma for this setting (Section 3, 7). |
| Researcher Affiliation | Academia | Gianluca Drappo Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL Alberto Maria Metelli Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL Marcello Restelli Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL |
| Pseudocode | Yes | Algorithm 1: UCRL-FH-SMDP Input: S, O with fixed policies, H, K Algorithm 2: Extended Value Iteration for FH-SMDP Input: S, O, Br k, Bp k Algorithm 3: Option Learning Input: the option o and Ko Algorithm 4: Two-Phase Algorithm Input: set of options O with random policies, Ko, and K |
| Open Source Code | No | No explicit statement about the release of source code or a link to a code repository is provided in the paper. |
| Open Datasets | No | The paper focuses on theoretical analysis of regret minimization algorithms in semi-Markov Decision Processes and does not describe experiments using specific datasets, thus no information about public datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with specific datasets, therefore, no information regarding training, testing, or validation splits is provided. |
| Hardware Specification | No | The paper is theoretical, presenting algorithms and their regret analysis without describing any computational experiments or hardware used for such experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithmic development and analysis, thus no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper provides a theoretical analysis of regret minimization algorithms and does not include details on experimental setup such as hyperparameters, model initialization, or training schedules. |