On the Convergence Rates of Policy Gradient Methods

Authors: Lin Xiao

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical First, we develop a theory of weak gradient-mapping dominance and use it to prove sharp sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods... enjoy a linear rate of convergence... Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.
Researcher Affiliation Industry Lin Xiao EMAIL Meta AI Research Seattle, WA 98109, USA
Pseudocode No The paper describes methods like the projected policy gradient method using equation (22) and policy mirror descent methods using equation (39), but does not present them in structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets No The paper is theoretical and analyzes convergence rates and sample complexity under a 'simple generative model' (Section 5.1). It does not perform experiments on publicly available datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets, thus no dataset splits are discussed.
Hardware Specification No The paper is theoretical and does not present any experimental results, so there is no mention of hardware specifications used.
Software Dependencies No The paper is theoretical and does not detail any experimental implementation, thus it does not list software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on convergence analysis, not experimental implementation. Therefore, it does not provide details on experimental setup or hyperparameter values.