Recurrent Natural Policy Gradient for POMDPs
Authors: Semih Cayci, Atilla Eryilmaz
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from long-term dependencies, thereby explaining limitations of RNN-based policy optimization for POMDPs... We study the performance of Rec-TD numerically in Section C under long-term and short-term dependencies to validate our theoretical results in Section 5.2... The performance of Rec-TD is studied numerically in Random-POMDP instances in Section C. |
| Researcher Affiliation | Academia | Semih Cayci EMAIL Department of Mathematics RWTH Aachen University... Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University |
| Pseudocode | Yes | Algorithm 1 Recurrent Natural Actor-Critic (Rec-NAC) a High-level description |
| Open Source Code | No | No explicit statement or link to source code for the described methodology is provided in the paper. |
| Open Datasets | No | The paper mentions numerical experiments using "randomly-generated finite POMDP instance" but does not provide access information or specify a publicly available dataset. |
| Dataset Splits | No | The paper describes generating random POMDP instances and performing "5 trials" but does not specify any train/test/validation splits, cross-validation setup, or other data partitioning methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We first consider the performance of Rec-TD with learning rate η = 0.05, discount factor γ = 0.9 and RNNs with various choices of network width m. For pexp = 0.8, the performance of Rec-TD is demonstrated in Figure 2... The exploration probability is reduced to pexp = 0.25... |