Recurrent Natural Policy Gradient for POMDPs

Authors: Semih Cayci, Atilla Eryilmaz

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from long-term dependencies, thereby explaining limitations of RNN-based policy optimization for POMDPs... We study the performance of Rec-TD numerically in Section C under long-term and short-term dependencies to validate our theoretical results in Section 5.2... The performance of Rec-TD is studied numerically in Random-POMDP instances in Section C.
Researcher Affiliation Academia Semih Cayci EMAIL Department of Mathematics RWTH Aachen University... Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University
Pseudocode Yes Algorithm 1 Recurrent Natural Actor-Critic (Rec-NAC) a High-level description
Open Source Code No No explicit statement or link to source code for the described methodology is provided in the paper.
Open Datasets No The paper mentions numerical experiments using "randomly-generated finite POMDP instance" but does not provide access information or specify a publicly available dataset.
Dataset Splits No The paper describes generating random POMDP instances and performing "5 trials" but does not specify any train/test/validation splits, cross-validation setup, or other data partitioning methodology.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes We first consider the performance of Rec-TD with learning rate η = 0.05, discount factor γ = 0.9 and RNNs with various choices of network width m. For pexp = 0.8, the performance of Rec-TD is demonstrated in Figure 2... The exploration probability is reduced to pexp = 0.25...