reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Polynomial-Time Approximability of Constrained Reinforcement Learning

Authors: Jeremy Mcmahan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time (0, ̈)-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as P = NP. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature.
Researcher Affiliation	Academia	Department of Computer Science, University of Wisconsin Madison, Wisconsin, USA. Correspondence to: Jeremy Mc Mahan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Reduction Algorithm 2 Augmented Interaction Algorithm 3 Approximate Backward Induction
Open Source Code	No	The paper does not contain any explicit statements about code availability, links to repositories, or supplementary materials containing source code for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and computational complexity analysis of constrained Markov decision processes. It does not conduct experiments or use any specific datasets, therefore, no information about publicly available datasets is provided.
Dataset Splits	No	The paper is theoretical and does not involve experiments with datasets, so there are no dataset splits mentioned.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and analysis. It does not describe any experimental setups or specify hardware used.
Software Dependencies	No	The paper is theoretical and focuses on algorithms and complexity analysis. It does not mention any specific software dependencies or version numbers.
Experiment Setup	No	The paper is theoretical and describes algorithms and their complexity without detailing any practical experimental setup, hyperparameters, or training configurations.