reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Natural Option Critic

Authors: Saket Tiwari, Philip S. Thomas5175-5182

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results showcase improvement over the vanilla gradient approach.
Researcher Affiliation	Academia	Saket Tiwari College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 EMAIL Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 EMAIL
Pseudocode	Yes	Algorithm 1 Incremental Natural Option-Critic Algorithm (INOC)
Open Source Code	No	The paper does not contain any statement about releasing source code or a link to a repository for the methodology described.
Open Datasets	Yes	We compare natural option-critic with the option critic framework on the Arcade Learning Environment (Bellemare et al. 2013). The four rooms domain (Sutton, Precup, and Singh 1999) is a particularly favorable case for demonstrating the use of options.
Dataset Splits	No	The paper describes experiments in reinforcement learning environments rather than using traditional datasets with specified training, validation, and test splits. It does not provide explicit dataset split information for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments, such as CPU or GPU models, or cloud instance specifications.
Software Dependencies	No	The paper mentions software components like 'RMSProp' but does not specify version numbers for any software, libraries, or environments, which would be necessary for reproducible dependency management.
Experiment Setup	Yes	MDP Setup: ... We set the learning rate for the intra-option policies, αθ, to be negligible... Four Rooms: The four rooms domain... αθ = αϑ = 0.0025, αη = 0.5, αϕ = 0.75, λ = 0.5 and critic LR 0.5... Arcade Learning Environment: ... with αθ = αϑ = 0.0025, αη = αϕ = 0.75, and λ = 0.5