reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning

Authors: Lucas Lehnert, Michael L. Littman

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This article introduces a new model, called the Linear Successor Feature Model (LSFM), and presents results demonstrating that LSFMs learn latent state spaces that support model-based RL. ... Then, Section 5 presents the link to model-free learning and presents a sequence of examples and simulation illustrating to what extent the representations learned with LSFMs generalize across tasks with different transitions, rewards, and optimal policies. ... Figure 6 presents the puddle-world experiments and the results. ... Figure 8 presents a transfer experiment highligting that reusing a previously learned reward-predictive state representation allows an intelligent agent to learn an optimal policy using less data. ... Figure 9 presents the last simulation result illustrating which aspect of an MDP reward-predictive state representations encode. ... Appendix C. Experiment Design
Researcher Affiliation	Academia	Lucas Lehnert EMAIL Computer Science Department Carney Institute for Brain Science Brown University Providence, RI 02912, USA. Michael L. Littman EMAIL Computer Science Department Brown University Providence, RI 02912, USA
Pseudocode	No	The paper includes mathematical equations for algorithms (e.g., Equation 42, 46 for parameter updates), but does not present any structured pseudocode blocks or figures explicitly labeled as algorithms.
Open Source Code	No	The paper mentions "Software available from tensorflow.org" in a footnote. This refers to the TensorFlow library, which is a third-party tool used in the research, not the authors' specific implementation code for the methodology described in the paper. There is no explicit statement or link provided for the authors' own source code.
Open Datasets	No	The paper refers to environments such as the "puddle-world task" (Section 4.3) and "combination lock task" (Section 5.3) from which data is collected (e.g., "a transition data set was collected from the puddle-world task"). These are custom experimental setups or environments, not standard publicly available datasets with specific access information (link, DOI, repository, or formal citation).
Dataset Splits	No	The paper describes generating data from custom environments, such as collecting "a transition data set" (Section 4.3) or "a data set DB" (Section 5.2) of varying or fixed sizes. It mentions, "For each data set size, twenty different data sets were sampled" (Figure 8(c)). This refers to data collection and sampling within the experimental environments rather than providing specific training/test/validation splits for a pre-existing, static dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions "TensorFlow (Abadi et al., 2015)" and the "Adam optimizer (Kingma and Ba, 2014)" in Appendix C.1. While TensorFlow is a software framework, its version number is not specified. Adam is an optimization algorithm. The provided information does not include specific version numbers for key software components or libraries, which is required for reproducibility.
Experiment Setup	Yes	Appendix C provides detailed experiment design, including specific hyperparameters. For example, Table 2 "Hyper-Parameter for Puddle-World Experiment" lists: Learning Rate (0.0005), αψ (0.01), αp (1.0), αN (0.1), Feature Dimension (80), Batch Size (50), Number of Training Transitions (10000), and Number of Gradient Steps (50000). Similar tables (Table 3 and Table 4) provide parameters for other experiments, such as learning rates, batch sizes, and gradient steps for LSFM and Fitted Q-iteration.