reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When Should Reinforcement Learning Use Causal Reasoning?

Authors: Oliver Schulte, Pascal Poupart

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper provides a theoretical study examining which reinforcement learning settings we can expect to benefit from causal reasoning, and how. According to our analysis, the key factor is whether the behavioral policy which generates the data can be executed by the learning agent, meaning that the observation signal available to the learning agent comprises all observations used by the behavioral policy. This paper can therefore serve as a short tutorial on causal modeling for RL researchers. Our analysis follows a ladder of causation as described by (Pearl, 2000): A hierarchy of probabilistic statements that require causal reasoning of increasing complexity. The levels correspond to associations, interventions, and counterfactuals.
Researcher Affiliation	Academia	Oliver Schulte EMAIL School of Computing Science, Simon Fraser University, Vancouver, Canada Pascal Poupart EMAIL Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada Vector Institute, Toronto, Canada
Pseudocode	No	The paper describes conceptual and theoretical frameworks, definitions, and mathematical equations (e.g., Bellman equations, structural equations), but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets	No	The paper uses illustrative examples (e.g., "simple sports setting," "driving scenario," "frozen lake") to explain theoretical concepts and does not use or refer to any publicly available or open datasets for empirical evaluation.
Dataset Splits	No	No empirical datasets are used in this theoretical paper, therefore there is no mention of dataset splits.
Hardware Specification	No	No empirical experiments were conducted as part of this theoretical study, thus no hardware specifications are provided.
Software Dependencies	No	No empirical experiments were conducted that would require specific software dependencies with version numbers.
Experiment Setup	No	This paper is a theoretical study and does not describe any specific experimental setup, hyperparameters, or training configurations for empirical evaluation.