reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Authors: Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contributions are primarily theoretical and aim to provide a theoretical account of the performance of end-to-end model-based methods. To help in this matter, we also provide some empirical results in simple illustrative problems which serve to demonstrate properties derived from our analysis.4 Empirical results
Researcher Affiliation	Academia	Clement Gehring Electrical Engineering and Computer Sciences Massachusetts Institute of Technology EMAIL Kenji Kawaguchi Center of Mathematical Sciences and Applications Harvard University EMAIL Jiaoyang Huang Courant Institute of Mathematical Sciences New York University EMAIL Leslie Pack Kaelbling Electrical Engineering and Computer Sciences Massachusetts Institute of Technology EMAIL
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All other implementation details, data and code are publicly available1. 1https://github.com/gehring/implicit-estimators
Open Datasets	Yes	We consider three simple, illustrative domains: a chain MDP, the four rooms domain and the mountain car domain, which we describe below. ... Four rooms [21]: ... Mountain car [13, 20]:
Dataset Splits	No	The paper mentions 'unseen trajectories' and 'generate an additional test dataset' but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard predefined split with a citation.
Hardware Specification	No	The paper states 'We provide sufﬁcient information to estimate this in the appendix as well as the hardware used.' but does not specify any particular hardware (e.g., GPU/CPU models) in the provided main text.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For all experiments, we used a batch size k = 25. We ran these experiments with several combinations of learning rates and internal discounts but only present a few representative results here.