reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization

Authors: Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study online learning in constrained MDPs (CMDPs), focusing on the goal of attaining sublinear strong regret and strong cumulative constraint violation. ... We answer such a question affirmatively, by providing an efficient policy optimization algorithm with e O(T) strong regret/violation. Our algorithm implements a primal-dual scheme that employs a state-of-the-art policy optimization approach for adversarial (unconstrained) MDPs as primal algorithm, and a UCB-like update for dual variables. ... The paper primarily focuses on presenting a novel algorithm, its theoretical properties, and bounds (e.g., e O(T) strong regret/violation), and includes proofs (Lemma 3, 4, 5, Theorem 1, 2). There are no sections describing experimental results, data analysis, or comparisons against baselines on datasets.
Researcher Affiliation	Academia	Francesco Emanuele Stradi Politecnico di Milano EMAIL, Matteo Castiglioni Politecnico di Milano EMAIL, Alberto Marchesi Politecnico di Milano EMAIL, Nicola Gatti Politecnico di Milano EMAIL
Pseudocode	Yes	Algorithm 1 Learner-Environment Interaction... Algorithm 2 Constrained Primal-Dual Policy Optimization (CPD-PO)
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper is theoretical in nature, focusing on algorithm design and proving theoretical bounds in Constrained Markov Decision Processes (CMDPs). It does not describe any empirical evaluations that would involve the use of specific datasets.
Dataset Splits	No	The paper is theoretical and does not describe experiments using specific datasets. Therefore, there is no mention of training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not describe any experimental setup or mention specific hardware (like GPUs, CPUs, or TPUs) used for running experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and analysis rather than implementation and experimentation. It does not mention any specific software libraries, frameworks, or their versions that would be needed to replicate experimental results.
Experiment Setup	No	The paper is theoretical and focuses on the design and analysis of an algorithm for online learning in CMDPs. It does not describe any empirical experiments, and therefore, does not include details such as hyperparameter values, training configurations, or system-level settings for an experimental setup.