Natural Option Critic
Authors: Saket Tiwari, Philip S. Thomas5175-5182
AAAI 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results showcase improvement over the vanilla gradient approach. |
| Researcher Affiliation | Academia | Saket Tiwari College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 EMAIL Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 EMAIL |
| Pseudocode | Yes | Algorithm 1 Incremental Natural Option-Critic Algorithm (INOC) |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We compare natural option-critic with the option critic framework on the Arcade Learning Environment (Bellemare et al. 2013). The four rooms domain (Sutton, Precup, and Singh 1999) is a particularly favorable case for demonstrating the use of options. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments rather than using traditional datasets with specified training, validation, and test splits. It does not provide explicit dataset split information for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments, such as CPU or GPU models, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions software components like 'RMSProp' but does not specify version numbers for any software, libraries, or environments, which would be necessary for reproducible dependency management. |
| Experiment Setup | Yes | MDP Setup: ... We set the learning rate for the intra-option policies, αθ, to be negligible... Four Rooms: The four rooms domain... αθ = αϑ = 0.0025, αη = 0.5, αϕ = 0.75, λ = 0.5 and critic LR 0.5... Arcade Learning Environment: ... with αθ = αϑ = 0.0025, αη = αϕ = 0.75, and λ = 0.5 |