Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization
Authors: Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our contributions are primarily theoretical and aim to provide a theoretical account of the performance of end-to-end model-based methods. To help in this matter, we also provide some empirical results in simple illustrative problems which serve to demonstrate properties derived from our analysis.4 Empirical results |
| Researcher Affiliation | Academia | Clement Gehring Electrical Engineering and Computer Sciences Massachusetts Institute of Technology EMAIL Kenji Kawaguchi Center of Mathematical Sciences and Applications Harvard University EMAIL Jiaoyang Huang Courant Institute of Mathematical Sciences New York University EMAIL Leslie Pack Kaelbling Electrical Engineering and Computer Sciences Massachusetts Institute of Technology EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All other implementation details, data and code are publicly available1. 1https://github.com/gehring/implicit-estimators |
| Open Datasets | Yes | We consider three simple, illustrative domains: a chain MDP, the four rooms domain and the mountain car domain, which we describe below. ... Four rooms [21]: ... Mountain car [13, 20]: |
| Dataset Splits | No | The paper mentions 'unseen trajectories' and 'generate an additional test dataset' but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard predefined split with a citation. |
| Hardware Specification | No | The paper states 'We provide sufficient information to estimate this in the appendix as well as the hardware used.' but does not specify any particular hardware (e.g., GPU/CPU models) in the provided main text. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For all experiments, we used a batch size k = 25. We ran these experiments with several combinations of learning rates and internal discounts but only present a few representative results here. |