reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Basis Function Adaptation for Reinforcement Learning

Authors: Edward Barker, Charl Ras

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	and finally (d) test experimentally the extent to which this algorithm can improve performance given a number of different test problems. Taken together our results suggest that our algorithm (and potentially such methods more generally) can provide a versatile and computationally lightweight means of significantly boosting RL performance given suitable conditions which are commonly encountered in practice. To corroborate our theoretical analysis, and to further address the more complex question of whether PASA will improve overall performance, we outline some experimental results in Section 4. We explore three different types of environment: a GARNET environment, a Gridworld type environment, and an environment representative of a logistics problem. Our experimental results suggest that PASA, and potentially, by extension, techniques based on similar principles, can significantly boost performance when compared to SARSA with fixed state aggregation.
Researcher Affiliation	Academia	Edward Barker EMAIL School of Mathematics and Statistics University of Melbourne Melbourne, Victoria 3010, Australia; Charl Ras EMAIL School of Mathematics and Statistics University of Melbourne Melbourne, Victoria 3010, Australia
Pseudocode	Yes	The PASA algorithm is outlined in Algorithm 1 and a diagram illustrating the main steps is at Figure 1. Note that the algorithm calls a procedure called Split, which is outlined in Algorithm 2.
Open Source Code	No	The text does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	No	The paper describes generating environments (GARNET, Gridworld, logistics problem) for experiments but does not provide concrete access information (links, DOIs, citations) for publicly available datasets used or created.
Dataset Splits	No	Each experiment was run for 100 individual trials for both SARSA-P, SARSA-F and (where applicable) SARSA with no state aggregation, using the same sequence of randomly generated environments. Each trial was run over 500 million iterations. For our experiments some minor changes have been made to the algorithm SARSA-P as we outlined it above (that is, changes which go beyond merely more efficiently implementing the same operations described in Algorithms 1 and 2).
Hardware Specification	Yes	The majority of experiments were run on an Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz for both algorithm variants.
Software Dependencies	No	The paper describes algorithms (SARSA, PASA) and mentions other RL techniques (TD(λ), Q-learning), but does not specify any particular software or library names with version numbers used for their implementation or experiments.
Experiment Setup	Yes	The parameters of PASA were kept the same for all environment types, with the exceptions of X0 and X (with X being changed for SARSA-F as well). The value of X0 was always set to X/2. The parameters used are shown in Table 2. Furthermore (as summarised in Table 5) SARSA-P requires only marginally greater computational time than SARSA-F, consistent with our discussion in Section 3.1. While we have not measured it explicitly, the same is certainly true for memory demands.