A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
Authors: Yue Frank Wu, Weitong ZHANG, Pan Xu, Quanquan Gu
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., J(θ) 2 2 ϵ) of the non-concave performance function J(θ), with e O(ϵ 2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods. |
| Researcher Affiliation | Academia | Yue Wu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL Pan Xu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL |
| Pseudocode | Yes | Algorithm 1 Two Time-Scale Actor-Critic |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository for the described methodology. |
| Open Datasets | No | The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not mention specific datasets or their public availability for training. |
| Dataset Splits | No | The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not provide details about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup that would require hardware specifications. No hardware details are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is purely theoretical and focuses on mathematical analysis and proofs. It does not describe any experimental setup details such as hyperparameters or system-level training settings. |