Policy-Regret Minimization in Markov Games with Function Approximation
Authors: Thanh Nguyen-Tang, Raman Arora
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a general algorithmic framework that achieves the optimal O(T) policy regret for a wide class of large-scale problems characterized by an Eluder-type condition... We show that BOVL attains a policy regret bound of V (m + H) d EγT, where V is the scale of the value functions, m the memory of the adversary, H the episode length, d E and γ the Eluder-type and covering-type complexities of the function classes, respectively, and T the total number of episodes. The full proof appears in Appendix A. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. Correspondence to: Raman Arora <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 BOVL(F, Ψ, ΠA , T, K) Batching and Optimism based on Value and Likelihood fitting |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on any specific dataset. It discusses problem settings like 'large state and action spaces' or 'linear Markov game' as abstract models rather than concrete datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation with datasets, thus no dataset splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental results, therefore no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies or their version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical, presenting an algorithmic framework and its theoretical guarantees. It does not include an experimental section or details on hyperparameter settings or system-level configurations. |