Policy-Regret Minimization in Markov Games with Function Approximation

Authors: Thanh Nguyen-Tang, Raman Arora

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a general algorithmic framework that achieves the optimal O(T) policy regret for a wide class of large-scale problems characterized by an Eluder-type condition... We show that BOVL attains a policy regret bound of V (m + H) d EγT, where V is the scale of the value functions, m the memory of the adversary, H the episode length, d E and γ the Eluder-type and covering-type complexities of the function classes, respectively, and T the total number of episodes. The full proof appears in Appendix A.
Researcher Affiliation Academia 1Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. Correspondence to: Raman Arora <EMAIL>.
Pseudocode Yes Algorithm 1 BOVL(F, Ψ, ΠA , T, K) Batching and Optimism based on Value and Likelihood fitting
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments on any specific dataset. It discusses problem settings like 'large state and action spaces' or 'linear Markov game' as abstract models rather than concrete datasets.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation with datasets, thus no dataset splits are discussed.
Hardware Specification No The paper is theoretical and does not describe experimental results, therefore no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not specify any software dependencies or their version numbers for implementation or experimentation.
Experiment Setup No The paper is theoretical, presenting an algorithmic framework and its theoretical guarantees. It does not include an experimental section or details on hyperparameter settings or system-level configurations.