reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

Authors: Sergio Rozada, Dongsheng Ding, Antonio G. Marques, Alejandro Ribeiro

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method in two continuous control problems: robot navigation and fluid control. Furthermore, we demonstrate that D-PGPD addresses the classical constrained navigation problem involving several types of cost functions and constraints. We show that D-PGPD can solve non-linear fluid control problems under constraints. We test D-PGPD on constrained robot navigation and fluid control problems (Figure 1). See Appendix F for more details. Figure 2: Avg. reward/utility value functions of AD-PGPD ( ) and PGDual ( ) iterates in the navigation problem. Figure 3: Avg. reward/utility value functions of AD-PGPD ( ) and PGDual ( ) iterates in a fluid velocity control.
Researcher Affiliation	Academia	1Dept. of Signal Theory and Communications, King Juan Carlos University 2Dept. of Electrical and Systems Engineering, University of Pennsylvania EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We detail how to estimate ˆVg(πt) and ˆQπt λ,τ(sn, an) via rollouts in Algorithms 1 and 2, which can be found in Appendix E. We use random horizon rollouts (Paternain et al. 2020; Zhang et al. 2020) to guarantee that the stochastic estimates of ˆQπt λ,τ and ˆVg(πt) are unbiased. Combining (9), the SGD rule in (12), and averaging techniques lead to a sample-based algorithm presented in Algorithm 3, in Appendix E.
Open Source Code	Yes	Code https://github.com/sergiorozada12/d-pg-pd
Open Datasets	No	The paper describes two simulation problems: robot navigation and fluid control problems. It references related work for these problems (Shimizu et al. 2020; Ma et al. 2022 for navigation, Baker et al. 2000 for fluid control) but does not provide specific access information (links, DOIs, repositories) for any dataset used in its experiments. The data is generated via simulation of these problems.
Dataset Splits	No	The paper describes experiments in simulated environments (robot navigation and fluid control). There is no mention of dataset splits (e.g., training/testing/validation percentages or counts) as one would typically find for pre-existing datasets. The experiments seem to involve learning and evaluation within these continuous simulation environments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used to run the computational experiments.
Software Dependencies	No	The paper does not mention specific software dependencies with version numbers (e.g., Python version, specific libraries like PyTorch or TensorFlow with their versions) used for implementing the methods or running experiments.
Experiment Setup	No	The paper describes the problem setups for robot navigation and fluid control in Section 6, including the dynamics and reward/constraint functions. However, it does not explicitly provide hyperparameters (e.g., learning rates, batch sizes, number of iterations for models, network architectures if used for function approximation) or other specific training configurations necessary to reproduce the experimental results. It mentions comparing AD-PGPD with PGDual over 40,000 and 10,000 iterations respectively, but lacks other details.