Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that with an optimistic planning oracle, our algorithm achieves sublinear regret and constraint violation in both cases and can attain the globally optimal policy of the original constrained problem. In this section, we provide theoretical analysis for Algorithms 1 and 2.
Researcher Affiliation Academia Zihao Li EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA; Boyi Liu EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Zhuoran Yang EMAIL Department of Statistics and Data Science Yale University CT 06511-6814, USA; Zhaoran Wang EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Mengdi Wang EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA
Pseudocode Yes Algorithm 1 Variational Primal-Dual Policy Optimization; Algorithm 2 VPDPO for KNR case; Algorithm 3 VPDPO for Low-rank MDP case
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository. The license mentioned pertains to the paper itself, not accompanying code.
Open Datasets No The paper is theoretical and does not conduct experiments on specific datasets. It discusses theoretical models and problem settings (e.g., Constrained Convex MDP, Kernelized Nonlinear Regulator, Low-rank MDP) rather than utilizing or providing access to empirical datasets for evaluation.
Dataset Splits No The paper focuses on theoretical contributions and does not present empirical experiments that would require dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any experimental setup involving specific hardware. No GPU, CPU, or other hardware specifications are mentioned.
Software Dependencies No The paper presents theoretical algorithms and does not describe their implementation or list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup No The paper is theoretical, presenting algorithms and their performance guarantees, but does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or training schedules.