reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Adversarial MDPs with Stochastic Hard Constraints

Authors: Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we design an algorithm attaining sublinear regret and cumulative positive constraints violation. In the second scenario, under the mild assumption that a policy strictly satisfying the constraints exists and is known to the learner, we design an algorithm that achieves sublinear regret while ensuring that constraints are satisfied at every episode with high probability. In the last scenario, we only assume the existence of a strictly feasible policy, which is not known to the learner, and we design an algorithm attaining sublinear regret and constant cumulative positive constraints violation. Finally, we show that in the last two scenarios, a dependence on the Slater s parameter is unavoidable. To the best of our knowledge, our work is the first to study CMDPs involving both adversarial losses and hard constraints. Thus, our algorithms can deal with general non-stationary environments subject to requirements much stricter than those manageable with existing ones, enabling their adoption in a much wider range of applications. This paper presents theoretical results and has the goal of advancing the field of Machine Learning. There are no potential societal consequences of our work which we feel must be highlighted here.
Researcher Affiliation	Academia	1DEIB, Politecnico di Milano, Milan, Italy. Correspondence to: Francesco Emanuele Stradi <EMAIL>.
Pseudocode	Yes	Algorithm 2 SV-OPS Require: X, A, α, T, δ, η, γ
Open Source Code	No	The paper does not provide any information about open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and mathematical proofs for constrained Markov decision processes (CMDPs). It does not perform empirical evaluations or mention the use of any datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets. Therefore, it does not specify any dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any empirical experiments, therefore no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and proofs. It does not describe any empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and mathematical proofs. It does not describe any empirical experiments, and therefore, no experimental setup details are provided.