Decentralized Robust V-learning for Solving Markov Games with Model Uncertainty
Authors: Shaocong Ma, Ziyi Chen, Shaofeng Zou, Yi Zhou
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we develop a theoretical solution to Markov games with environment model uncertainty. Specifically, we propose a new and tractable notion of robust correlated equilibria for Markov games with environment model uncertainty. In particular, we prove that the robust correlated equilibrium has a simple modification structure, and its characterization of equilibria critically depends on the environment model uncertainty. Moreover, we propose the first fully-decentralized stochastic algorithm for computing such the robust correlated equilibrium. Our analysis proves that the algorithm achieves the polynomial episode complexity e O(SA2H5ϵ 2) for computing an approximate robust correlated equilibrium with ϵ accuracy. |
| Researcher Affiliation | Academia | Shaocong Ma EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA; Ziyi Chen EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA; Shaofeng Zou EMAIL Department of Electrical Engineering University at Buffalo, The State University of New York Buffalo, NY 14260, USA; Yi Zhou EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA |
| Pseudocode | Yes | Algorithm 1: Decentralized Robust V-Learning (j-th player)... Algorithm 2: Implement output policy ˆπk,h. (Algorithm 3 from Jin et al. (2022a))... Algorithm 3: Adversarial bandit algorithm (ADV_BANDIT) |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. |
| Open Datasets | No | The paper focuses on theoretical solutions for Markov games with model uncertainty, using examples like KL divergence and R-contamination models, which are theoretical uncertainty sets. It does not mention any specific public or open datasets used for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on datasets, therefore no dataset splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware used for computation. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis, rather than implementation details. It does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting algorithms and their convergence analysis. It does not contain an experimental section or details regarding hyperparameters or training configurations. |