Unsupervised Basis Function Adaptation for Reinforcement Learning
Authors: Edward Barker, Charl Ras
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | and finally (d) test experimentally the extent to which this algorithm can improve performance given a number of different test problems. Taken together our results suggest that our algorithm (and potentially such methods more generally) can provide a versatile and computationally lightweight means of significantly boosting RL performance given suitable conditions which are commonly encountered in practice. To corroborate our theoretical analysis, and to further address the more complex question of whether PASA will improve overall performance, we outline some experimental results in Section 4. We explore three different types of environment: a GARNET environment, a Gridworld type environment, and an environment representative of a logistics problem. Our experimental results suggest that PASA, and potentially, by extension, techniques based on similar principles, can significantly boost performance when compared to SARSA with fixed state aggregation. |
| Researcher Affiliation | Academia | Edward Barker EMAIL School of Mathematics and Statistics University of Melbourne Melbourne, Victoria 3010, Australia; Charl Ras EMAIL School of Mathematics and Statistics University of Melbourne Melbourne, Victoria 3010, Australia |
| Pseudocode | Yes | The PASA algorithm is outlined in Algorithm 1 and a diagram illustrating the main steps is at Figure 1. Note that the algorithm calls a procedure called Split, which is outlined in Algorithm 2. |
| Open Source Code | No | The text does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | No | The paper describes generating environments (GARNET, Gridworld, logistics problem) for experiments but does not provide concrete access information (links, DOIs, citations) for publicly available datasets used or created. |
| Dataset Splits | No | Each experiment was run for 100 individual trials for both SARSA-P, SARSA-F and (where applicable) SARSA with no state aggregation, using the same sequence of randomly generated environments. Each trial was run over 500 million iterations. For our experiments some minor changes have been made to the algorithm SARSA-P as we outlined it above (that is, changes which go beyond merely more efficiently implementing the same operations described in Algorithms 1 and 2). |
| Hardware Specification | Yes | The majority of experiments were run on an Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz for both algorithm variants. |
| Software Dependencies | No | The paper describes algorithms (SARSA, PASA) and mentions other RL techniques (TD(λ), Q-learning), but does not specify any particular software or library names with version numbers used for their implementation or experiments. |
| Experiment Setup | Yes | The parameters of PASA were kept the same for all environment types, with the exceptions of X0 and X (with X being changed for SARSA-F as well). The value of X0 was always set to X/2. The parameters used are shown in Table 2. Furthermore (as summarised in Table 5) SARSA-P requires only marginally greater computational time than SARSA-F, consistent with our discussion in Section 3.1. While we have not measured it explicitly, the same is certainly true for memory demands. |