Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning
Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that with an optimistic planning oracle, our algorithm achieves sublinear regret and constraint violation in both cases and can attain the globally optimal policy of the original constrained problem. In this section, we provide theoretical analysis for Algorithms 1 and 2. |
| Researcher Affiliation | Academia | Zihao Li EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA; Boyi Liu EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Zhuoran Yang EMAIL Department of Statistics and Data Science Yale University CT 06511-6814, USA; Zhaoran Wang EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Mengdi Wang EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA |
| Pseudocode | Yes | Algorithm 1 Variational Primal-Dual Policy Optimization; Algorithm 2 VPDPO for KNR case; Algorithm 3 VPDPO for Low-rank MDP case |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. The license mentioned pertains to the paper itself, not accompanying code. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets. It discusses theoretical models and problem settings (e.g., Constrained Convex MDP, Kernelized Nonlinear Regulator, Low-rank MDP) rather than utilizing or providing access to empirical datasets for evaluation. |
| Dataset Splits | No | The paper focuses on theoretical contributions and does not present empirical experiments that would require dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup involving specific hardware. No GPU, CPU, or other hardware specifications are mentioned. |
| Software Dependencies | No | The paper presents theoretical algorithms and does not describe their implementation or list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | No | The paper is theoretical, presenting algorithms and their performance guarantees, but does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or training schedules. |