reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy Evaluation with Temporal Differences: A Survey and Comparison

Authors: Christoph Dann, Gerhard Neumann, Jan Peters

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By presenting the ﬁrst extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual-gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved oﬀ-policy performance.
Researcher Affiliation	Academia	Christoph Dann EMAIL Gerhard Neumann EMAIL Technische Universität Darmstadt Karolinenplatz 5 64289 Darmstadt, Germany Jan Peters EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38 72076 Tübingen, Germany
Pseudocode	Yes	Appendix C. Algorithms The following pseudo-code listings show the update rules of all discussed temporal-diﬀerence algorithms. These updates are executed for each transition from st to st+1 performing action at and getting the reward rt. Algorithm 1 TD(λ) Learning ... Algorithm 13 Residual-gradient algorithm without double-samples
Open Source Code	Yes	All algorithms were implemented in Python. The source code for each method and experiment is available at http:// github.com/chrodan/tdlearn.
Open Datasets	No	The paper describes several benchmark environments (e.g., "14-State Boyan Chain" and "Baird s Star Example") and a "Randomly Sampled MDP" but does not provide explicit links, DOIs, or citations to pre-collected datasets. The experimental setup involves simulating these environments or sampling the MDPs, rather than loading static, publicly available data files for analysis.
Dataset Splits	No	The paper mentions generating data through "roll-outs" or "samples" from MDPs. It states, "We computed the algorithms predictions with an increasing number of training data points". This indicates an online or simulated data generation process rather than predefined train/test/validation splits from a static dataset.
Hardware Specification	Yes	The results are averages of 10 independent runs executed on a single core of an i7 Intel CPU.
Software Dependencies	No	All algorithms were implemented in Python. However, no specific version of Python or any other software libraries used were mentioned.
Experiment Setup	Yes	The behavior of policy evaluation methods can be inﬂuenced by adjusting their hyperparameters. We set those parameters by performing an exhaustive grid-search in the hyperparameter space minimizing the MSBE (for the residual-gradient algorithm and BRM) or MSPBE. Table 3: Considered values in the grid-search parameter optimization for the algorithms listed in Table 4.