reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Comprehensive Survey on Safe Reinforcement Learning

Authors: Javier García, Fernando Fernández

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	A Comprehensive Survey on Safe Reinforcement Learning. Javier Garc ıa EMAIL Fernando Fern andez EMAIL Universidad Carlos III de Madrid, Avenida de la Universidad 30, 28911 Leganes, Madrid, Spain. Safe Reinforcement Learning can be deﬁned as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The ﬁrst is based on the modiﬁcation of the optimality criterion, the classic discounted ﬁnite/inﬁnite horizon, with a safety factor. The second is based on the modiﬁcation of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classiﬁcation to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
Researcher Affiliation	Academia	Javier Garc ıa EMAIL Fernando Fern andez EMAIL Universidad Carlos III de Madrid, Avenida de la Universidad 30, 28911 Leganes, Madrid, Spain
Pseudocode	No	The paper describes various algorithms and their mathematical formulations, such as the ˆQ Learning algorithm and the β-pessimistic Q learning, using equations like "ˆQ(st, at) = min( ˆQ(st, at), rt+1 + γ max at+1 A ˆQ(st+1, at+1))". However, it does not present any structured pseudocode blocks or clearly labeled algorithm sections.
Open Source Code	No	The paper is a comprehensive survey of existing literature on Safe Reinforcement Learning. It does not introduce a new methodology for which source code would be provided. Therefore, there is no statement about releasing code or a link to a code repository for the work described in this paper.
Open Datasets	No	The paper is a survey and analyzes existing literature, often referring to environments or problems used in other research (e.g., "stochastic cliﬀworld environment", "helicopter hovering control task", "Grid-World domain"). However, it does not conduct its own experiments using a specific dataset, nor does it provide access information for any dataset related to its own methodology.
Dataset Splits	No	The paper is a survey of existing literature and does not conduct its own experiments. Therefore, it does not specify any training/test/validation dataset splits for its own work. While it mentions concepts like "5-fold cross-validation" in Section 3.3, this is in reference to methods discussed from other papers, not an experimental setup for the survey itself.
Hardware Specification	No	The paper is a literature survey and does not involve running experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is a literature survey and does not involve implementing algorithms or running experiments. Therefore, it does not list any specific software dependencies or version numbers.
Experiment Setup	No	The paper is a comprehensive survey on Safe Reinforcement Learning and focuses on categorizing and analyzing existing literature. It does not conduct original experiments or propose a new model, and therefore, does not describe any experimental setup details such as hyperparameters or training configurations.