reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Average-Reward Reinforcement Learning

Authors: Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette, Shaofeng Zou

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we numerically verify our theoretical results. We aim to verify two aspects of our methods: the convergence of the algorithms, and the robustness of them. Additional experiments can be found in Appendix A.
Researcher Affiliation	Collaboration	Yue Wang EMAIL University of Central Florida Alvaro Velasquez EMAIL University of Colorado Boulder George Atia EMAIL University of Central Florida Ashley Prater-Bennette EMAIL Air Force Research Laboratory Shaofeng Zou EMAIL University at Buffalo, The State University of New York
Pseudocode	Yes	Algorithm 1 Robust VI: Policy Evaluation Algorithm 2 Robust VI: Optimal Control Algorithm 3 Robust RVI Algorithm 4 Robust RVI TD Algorithm 5 Robust RVI Q-learning
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the methodology described.
Open Datasets	Yes	We first verify the convergence of our robust RVI TD and Q-learning algorithms under a Garnet problem G(30, 20) (Archibald et al., 1995). We first consider the recycling robot problem (Example 3.3 (Sutton & Barto, 2018)). We further verify our robust RVI TD algorithm and robust RVI Q-learning under the Frozen Lake environment of Open AI (Brockman et al., 2016).
Dataset Splits	No	The paper describes using problem environments like the Garnet problem, Recycling Robot, and Frozen-Lake. While these environments define how data (experiences/trajectories) are generated during reinforcement learning, the paper does not specify fixed training/test/validation splits of pre-collected datasets in terms of percentages, sample counts, or explicit splitting methodologies. For example, it doesn't state how collected trajectories are divided for evaluation beyond the inherent process of RL training and policy evaluation.
Hardware Specification	No	The paper does not explicitly describe any specific hardware components (e.g., GPU/CPU models, memory, or accelerator types) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software libraries, programming languages, or environments used in the experiments. It mentions 'Open AI' in Appendix A.2, but not a version.
Experiment Setup	Yes	We set the radius of the uncertainty set ζ = 0.4, αn = 0.01, f(V ) = P s V (s) /\|S\| and f(Q) = P s,a Q(s,a) /\|S\|\|A\|. We set ζ = 0.4 and implement our algorithms and vanilla Q-learning under the nominal environment (α = β = 0.5) with stepsize 0.01. We first set ζ = 0.4 and αt = 0.01, and implement our algorithms and vanilla Q-learning under the nominal environment where Dt Uniform(0, 16) is generated following the uniform distribution.