Polynomial-Time Approximability of Constrained Reinforcement Learning

Authors: Jeremy Mcmahan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time (0, ̈)-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as P = NP. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature.
Researcher Affiliation Academia Department of Computer Science, University of Wisconsin Madison, Wisconsin, USA. Correspondence to: Jeremy Mc Mahan <EMAIL>.
Pseudocode Yes Algorithm 1 Reduction Algorithm 2 Augmented Interaction Algorithm 3 Approximate Backward Induction
Open Source Code No The paper does not contain any explicit statements about code availability, links to repositories, or supplementary materials containing source code for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm design and computational complexity analysis of constrained Markov decision processes. It does not conduct experiments or use any specific datasets, therefore, no information about publicly available datasets is provided.
Dataset Splits No The paper is theoretical and does not involve experiments with datasets, so there are no dataset splits mentioned.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis. It does not describe any experimental setups or specify hardware used.
Software Dependencies No The paper is theoretical and focuses on algorithms and complexity analysis. It does not mention any specific software dependencies or version numbers.
Experiment Setup No The paper is theoretical and describes algorithms and their complexity without detailing any practical experimental setup, hyperparameters, or training configurations.