reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Authors: Ronald Ortner

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We give a simple optimistic algorithm for which it is easy to derive regret bounds of O( tmix SAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter tmix. These bounds are the ﬁrst regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter. The proof of the regret bound is much simpler than for bounds achieved before and relies on concentration results for Markov chains.
Researcher Affiliation	Academia	Ronald Ortner EMAIL Department Mathematik und Informationstechnolgie Montanuniversit at Leoben Franz-Josef-Strasse 18 8700 Leoben, Austria
Pseudocode	Yes	Algorithm 1 Optimistic Sample Path (Osp) and Algorithm 2 Path Construction
Open Source Code	No	However, at the moment this is not more than an idea for future research and the details of such an algorithm are yet to be developed.
Open Datasets	No	The paper is theoretical and focuses on deriving regret bounds for reinforcement learning in Markov decision processes. It does not describe any experiments that would use a specific dataset.
Dataset Splits	No	The paper is theoretical and does not involve empirical evaluation on datasets, thus no dataset splits are discussed.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or empirical evaluation, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper describes a theoretical algorithm and its regret analysis, but does not provide details on specific software implementations or their version numbers.
Experiment Setup	No	The paper focuses on theoretical regret bounds and algorithm design for reinforcement learning, without detailing any experimental setup or hyperparameters.