reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-MDP

Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we provide an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure. Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself. Finally, we relax the assumption of having pre-trained options to show how, in particular situations, is still preferable a hierarchical approach over a standard one. Contributions The paper s contributions can be summarized as follows. (1) We propose a new algorithm for the finite-horizon setting that exploits a set of fixed options (Sutton et al., 1999) to solve the HRL problem (Section 4). (2) We conducted a regret analysis of this algorithm, providing an option-dependent upper bound, which, to the best of our knowledge, is the first in the Finite Horizon setting. This result could be used to define new objective functions for options discovery methods to search for options minimizing this regret (Section 4.1). (3) For the sake of our analysis, we formulate the notion of Finite Horizon SMDP and a performance difference lemma for this setting (Section 3, 7).
Researcher Affiliation	Academia	Gianluca Drappo Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL Alberto Maria Metelli Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL Marcello Restelli Department of Electronics, Information and Bioengineering Politecnico di Milano EMAIL
Pseudocode	Yes	Algorithm 1: UCRL-FH-SMDP Input: S, O with fixed policies, H, K Algorithm 2: Extended Value Iteration for FH-SMDP Input: S, O, Br k, Bp k Algorithm 3: Option Learning Input: the option o and Ko Algorithm 4: Two-Phase Algorithm Input: set of options O with random policies, Ko, and K
Open Source Code	No	No explicit statement about the release of source code or a link to a code repository is provided in the paper.
Open Datasets	No	The paper focuses on theoretical analysis of regret minimization algorithms in semi-Markov Decision Processes and does not describe experiments using specific datasets, thus no information about public datasets is provided.
Dataset Splits	No	The paper is theoretical and does not describe experiments with specific datasets, therefore, no information regarding training, testing, or validation splits is provided.
Hardware Specification	No	The paper is theoretical, presenting algorithms and their regret analysis without describing any computational experiments or hardware used for such experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithmic development and analysis, thus no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper provides a theoretical analysis of regret minimization algorithms and does not include details on experimental setup such as hyperparameters, model initialization, or training schedules.