reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recurrent Natural Policy Gradient for POMDPs

Authors: Semih Cayci, Atilla Eryilmaz

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from long-term dependencies, thereby explaining limitations of RNN-based policy optimization for POMDPs... We study the performance of Rec-TD numerically in Section C under long-term and short-term dependencies to validate our theoretical results in Section 5.2... The performance of Rec-TD is studied numerically in Random-POMDP instances in Section C.
Researcher Affiliation	Academia	Semih Cayci EMAIL Department of Mathematics RWTH Aachen University... Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University
Pseudocode	Yes	Algorithm 1 Recurrent Natural Actor-Critic (Rec-NAC) a High-level description
Open Source Code	No	No explicit statement or link to source code for the described methodology is provided in the paper.
Open Datasets	No	The paper mentions numerical experiments using "randomly-generated finite POMDP instance" but does not provide access information or specify a publicly available dataset.
Dataset Splits	No	The paper describes generating random POMDP instances and performing "5 trials" but does not specify any train/test/validation splits, cross-validation setup, or other data partitioning methodology.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We first consider the performance of Rec-TD with learning rate η = 0.05, discount factor γ = 0.9 and RNNs with various choices of network width m. For pexp = 0.8, the performance of Rec-TD is demonstrated in Figure 2... The exploration probability is reduced to pexp = 0.25...