reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that with an optimistic planning oracle, our algorithm achieves sublinear regret and constraint violation in both cases and can attain the globally optimal policy of the original constrained problem. In this section, we provide theoretical analysis for Algorithms 1 and 2.
Researcher Affiliation	Academia	Zihao Li EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA; Boyi Liu EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Zhuoran Yang EMAIL Department of Statistics and Data Science Yale University CT 06511-6814, USA; Zhaoran Wang EMAIL Department of Industrial Engineering and Management Sciences Northwestern University IL 60208, USA; Mengdi Wang EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544, USA
Pseudocode	Yes	Algorithm 1 Variational Primal-Dual Policy Optimization; Algorithm 2 VPDPO for KNR case; Algorithm 3 VPDPO for Low-rank MDP case
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository. The license mentioned pertains to the paper itself, not accompanying code.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets. It discusses theoretical models and problem settings (e.g., Constrained Convex MDP, Kernelized Nonlinear Regulator, Low-rank MDP) rather than utilizing or providing access to empirical datasets for evaluation.
Dataset Splits	No	The paper focuses on theoretical contributions and does not present empirical experiments that would require dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup involving specific hardware. No GPU, CPU, or other hardware specifications are mentioned.
Software Dependencies	No	The paper presents theoretical algorithms and does not describe their implementation or list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup	No	The paper is theoretical, presenting algorithms and their performance guarantees, but does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or training schedules.