reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

Authors: Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt, Laura Toni

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this survey, we review the state of the art of Temporal Credit Assignment (CA) in deep RL. We propose a unifying formalism for credit that enables equitable comparisons of state of the art algorithms and improves our understanding of the trade-oﬀs between the various methods. We cast the CAP as the problem of learning the inﬂuence of an action over an outcome from a ﬁnite amount of experience. We discuss the challenges posed by delayed eﬀects, transpositions, and a lack of action inﬂuence, and analyse how existing methods aim to address them. Finally, we survey the protocols to evaluate a credit assignment method and suggest ways to diagnose the sources of struggle for diﬀerent methods.
Researcher Affiliation	Collaboration	Eduardo Pignatelli EMAIL University College London Johan Ferret EMAIL Google Deep Mind Hado van Hasselt EMAIL Google Deep Mind Matthieu Geist EMAIL Google Deep Mind Thomas Mesnard EMAIL Google Deep Mind Olivier Pietquin EMAIL Google Deep Mind Laura Toni EMAIL University College London
Pseudocode	No	The paper provides a comprehensive survey of methods for temporal credit assignment in deep RL but does not include any structured pseudocode or algorithm blocks for a new method proposed by the authors.
Open Source Code	No	The paper discusses the challenges of reproducibility and open-source code in the field, stating, "Many works propose open-source code, but experiments are often not reproducible, their code is hard to read, hard to run and hard to understand." However, it does not provide any explicit statement or link to source code for the methodology or analysis presented in this particular survey paper.
Open Datasets	No	The paper is a survey of the state of the art and does not conduct its own experiments using specific datasets. Section 7.2, "Diagnostic tasks" and "Tasks at scale," mentions various environments and benchmarks (e.g., Atari, Viz Doom, Box World) from the literature that are typically used for evaluating RL agents, but these are not datasets that the authors of this survey paper used for their own analysis or provide access information for.
Dataset Splits	No	The paper is a survey and does not conduct its own experiments with datasets. Therefore, it does not provide details on training/test/validation dataset splits.
Hardware Specification	No	The paper is a survey and does not conduct its own experiments. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies	No	The paper is a survey and does not conduct its own experiments. Therefore, it does not specify any software dependencies with version numbers used for implementing or running experiments.
Experiment Setup	No	The paper is a survey of existing research and does not present new experimental results. Consequently, it does not include an experimental setup section with specific hyperparameters, training configurations, or system-level settings.