On the Convergence Rates of Policy Gradient Methods
Authors: Lin Xiao
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | First, we develop a theory of weak gradient-mapping dominance and use it to prove sharp sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods... enjoy a linear rate of convergence... Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model. |
| Researcher Affiliation | Industry | Lin Xiao EMAIL Meta AI Research Seattle, WA 98109, USA |
| Pseudocode | No | The paper describes methods like the projected policy gradient method using equation (22) and policy mirror descent methods using equation (39), but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | No | The paper is theoretical and analyzes convergence rates and sample complexity under a 'simple generative model' (Section 5.1). It does not perform experiments on publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, thus no dataset splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not present any experimental results, so there is no mention of hardware specifications used. |
| Software Dependencies | No | The paper is theoretical and does not detail any experimental implementation, thus it does not list software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on convergence analysis, not experimental implementation. Therefore, it does not provide details on experimental setup or hyperparameter values. |