A Comprehensive Survey on Safe Reinforcement Learning
Authors: Javier García, Fernando Fernández
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | A Comprehensive Survey on Safe Reinforcement Learning. Javier Garc ıa EMAIL Fernando Fern andez EMAIL Universidad Carlos III de Madrid, Avenida de la Universidad 30, 28911 Leganes, Madrid, Spain. Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning. |
| Researcher Affiliation | Academia | Javier Garc ıa EMAIL Fernando Fern andez EMAIL Universidad Carlos III de Madrid, Avenida de la Universidad 30, 28911 Leganes, Madrid, Spain |
| Pseudocode | No | The paper describes various algorithms and their mathematical formulations, such as the ˆQ Learning algorithm and the β-pessimistic Q learning, using equations like "ˆQ(st, at) = min( ˆQ(st, at), rt+1 + γ max at+1 A ˆQ(st+1, at+1))". However, it does not present any structured pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | No | The paper is a comprehensive survey of existing literature on Safe Reinforcement Learning. It does not introduce a new methodology for which source code would be provided. Therefore, there is no statement about releasing code or a link to a code repository for the work described in this paper. |
| Open Datasets | No | The paper is a survey and analyzes existing literature, often referring to environments or problems used in other research (e.g., "stochastic cliffworld environment", "helicopter hovering control task", "Grid-World domain"). However, it does not conduct its own experiments using a specific dataset, nor does it provide access information for any dataset related to its own methodology. |
| Dataset Splits | No | The paper is a survey of existing literature and does not conduct its own experiments. Therefore, it does not specify any training/test/validation dataset splits for its own work. While it mentions concepts like "5-fold cross-validation" in Section 3.3, this is in reference to methods discussed from other papers, not an experimental setup for the survey itself. |
| Hardware Specification | No | The paper is a literature survey and does not involve running experiments that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is a literature survey and does not involve implementing algorithms or running experiments. Therefore, it does not list any specific software dependencies or version numbers. |
| Experiment Setup | No | The paper is a comprehensive survey on Safe Reinforcement Learning and focuses on categorizing and analyzing existing literature. It does not conduct original experiments or propose a new model, and therefore, does not describe any experimental setup details such as hyperparameters or training configurations. |