A Survey of Temporal Credit Assignment in Deep Reinforcement Learning
Authors: Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt, Laura Toni
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this survey, we review the state of the art of Temporal Credit Assignment (CA) in deep RL. We propose a unifying formalism for credit that enables equitable comparisons of state of the art algorithms and improves our understanding of the trade-offs between the various methods. We cast the CAP as the problem of learning the influence of an action over an outcome from a finite amount of experience. We discuss the challenges posed by delayed effects, transpositions, and a lack of action influence, and analyse how existing methods aim to address them. Finally, we survey the protocols to evaluate a credit assignment method and suggest ways to diagnose the sources of struggle for different methods. |
| Researcher Affiliation | Collaboration | Eduardo Pignatelli EMAIL University College London Johan Ferret EMAIL Google Deep Mind Hado van Hasselt EMAIL Google Deep Mind Matthieu Geist EMAIL Google Deep Mind Thomas Mesnard EMAIL Google Deep Mind Olivier Pietquin EMAIL Google Deep Mind Laura Toni EMAIL University College London |
| Pseudocode | No | The paper provides a comprehensive survey of methods for temporal credit assignment in deep RL but does not include any structured pseudocode or algorithm blocks for a new method proposed by the authors. |
| Open Source Code | No | The paper discusses the challenges of reproducibility and open-source code in the field, stating, "Many works propose open-source code, but experiments are often not reproducible, their code is hard to read, hard to run and hard to understand." However, it does not provide any explicit statement or link to source code for the methodology or analysis presented in this particular survey paper. |
| Open Datasets | No | The paper is a survey of the state of the art and does not conduct its own experiments using specific datasets. Section 7.2, "Diagnostic tasks" and "Tasks at scale," mentions various environments and benchmarks (e.g., Atari, Viz Doom, Box World) from the literature that are typically used for evaluating RL agents, but these are not datasets that the authors of this survey paper used for their own analysis or provide access information for. |
| Dataset Splits | No | The paper is a survey and does not conduct its own experiments with datasets. Therefore, it does not provide details on training/test/validation dataset splits. |
| Hardware Specification | No | The paper is a survey and does not conduct its own experiments. Therefore, it does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper is a survey and does not conduct its own experiments. Therefore, it does not specify any software dependencies with version numbers used for implementing or running experiments. |
| Experiment Setup | No | The paper is a survey of existing research and does not present new experimental results. Consequently, it does not include an experimental setup section with specific hyperparameters, training configurations, or system-level settings. |