Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
Authors: Lucas Lehnert, Michael L. Littman
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This article introduces a new model, called the Linear Successor Feature Model (LSFM), and presents results demonstrating that LSFMs learn latent state spaces that support model-based RL. ... Then, Section 5 presents the link to model-free learning and presents a sequence of examples and simulation illustrating to what extent the representations learned with LSFMs generalize across tasks with different transitions, rewards, and optimal policies. ... Figure 6 presents the puddle-world experiments and the results. ... Figure 8 presents a transfer experiment highligting that reusing a previously learned reward-predictive state representation allows an intelligent agent to learn an optimal policy using less data. ... Figure 9 presents the last simulation result illustrating which aspect of an MDP reward-predictive state representations encode. ... Appendix C. Experiment Design |
| Researcher Affiliation | Academia | Lucas Lehnert EMAIL Computer Science Department Carney Institute for Brain Science Brown University Providence, RI 02912, USA. Michael L. Littman EMAIL Computer Science Department Brown University Providence, RI 02912, USA |
| Pseudocode | No | The paper includes mathematical equations for algorithms (e.g., Equation 42, 46 for parameter updates), but does not present any structured pseudocode blocks or figures explicitly labeled as algorithms. |
| Open Source Code | No | The paper mentions "Software available from tensorflow.org" in a footnote. This refers to the TensorFlow library, which is a third-party tool used in the research, not the authors' specific implementation code for the methodology described in the paper. There is no explicit statement or link provided for the authors' own source code. |
| Open Datasets | No | The paper refers to environments such as the "puddle-world task" (Section 4.3) and "combination lock task" (Section 5.3) from which data is collected (e.g., "a transition data set was collected from the puddle-world task"). These are custom experimental setups or environments, not standard publicly available datasets with specific access information (link, DOI, repository, or formal citation). |
| Dataset Splits | No | The paper describes generating data from custom environments, such as collecting "a transition data set" (Section 4.3) or "a data set DB" (Section 5.2) of varying or fixed sizes. It mentions, "For each data set size, twenty different data sets were sampled" (Figure 8(c)). This refers to data collection and sampling within the experimental environments rather than providing specific training/test/validation splits for a pre-existing, static dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models, processor types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions "TensorFlow (Abadi et al., 2015)" and the "Adam optimizer (Kingma and Ba, 2014)" in Appendix C.1. While TensorFlow is a software framework, its version number is not specified. Adam is an optimization algorithm. The provided information does not include specific version numbers for key software components or libraries, which is required for reproducibility. |
| Experiment Setup | Yes | Appendix C provides detailed experiment design, including specific hyperparameters. For example, Table 2 "Hyper-Parameter for Puddle-World Experiment" lists: Learning Rate (0.0005), αψ (0.01), αp (1.0), αN (0.1), Feature Dimension (80), Batch Size (50), Number of Training Transitions (10000), and Number of Gradient Steps (50000). Similar tables (Table 3 and Table 4) provide parameters for other experiments, such as learning rates, batch sizes, and gradient steps for LSFM and Fitted Q-iteration. |