Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

Authors: Sergio Rozada, Dongsheng Ding, Antonio G. Marques, Alejandro Ribeiro

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method in two continuous control problems: robot navigation and fluid control. Furthermore, we demonstrate that D-PGPD addresses the classical constrained navigation problem involving several types of cost functions and constraints. We show that D-PGPD can solve non-linear fluid control problems under constraints. We test D-PGPD on constrained robot navigation and fluid control problems (Figure 1). See Appendix F for more details. Figure 2: Avg. reward/utility value functions of AD-PGPD ( ) and PGDual ( ) iterates in the navigation problem. Figure 3: Avg. reward/utility value functions of AD-PGPD ( ) and PGDual ( ) iterates in a fluid velocity control.
Researcher Affiliation Academia 1Dept. of Signal Theory and Communications, King Juan Carlos University 2Dept. of Electrical and Systems Engineering, University of Pennsylvania EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes We detail how to estimate ˆVg(πt) and ˆQπt λ,τ(sn, an) via rollouts in Algorithms 1 and 2, which can be found in Appendix E. We use random horizon rollouts (Paternain et al. 2020; Zhang et al. 2020) to guarantee that the stochastic estimates of ˆQπt λ,τ and ˆVg(πt) are unbiased. Combining (9), the SGD rule in (12), and averaging techniques lead to a sample-based algorithm presented in Algorithm 3, in Appendix E.
Open Source Code Yes Code https://github.com/sergiorozada12/d-pg-pd
Open Datasets No The paper describes two simulation problems: robot navigation and fluid control problems. It references related work for these problems (Shimizu et al. 2020; Ma et al. 2022 for navigation, Baker et al. 2000 for fluid control) but does not provide specific access information (links, DOIs, repositories) for any dataset used in its experiments. The data is generated via simulation of these problems.
Dataset Splits No The paper describes experiments in simulated environments (robot navigation and fluid control). There is no mention of dataset splits (e.g., training/testing/validation percentages or counts) as one would typically find for pre-existing datasets. The experiments seem to involve learning and evaluation within these continuous simulation environments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used to run the computational experiments.
Software Dependencies No The paper does not mention specific software dependencies with version numbers (e.g., Python version, specific libraries like PyTorch or TensorFlow with their versions) used for implementing the methods or running experiments.
Experiment Setup No The paper describes the problem setups for robot navigation and fluid control in Section 6, including the dynamics and reward/constraint functions. However, it does not explicitly provide hyperparameters (e.g., learning rates, batch sizes, number of iterations for models, network architectures if used for function approximation) or other specific training configurations necessary to reproduce the experimental results. It mentions comparing AD-PGPD with PGDual over 40,000 and 10,000 iterations respectively, but lacks other details.