Decentralized Robust V-learning for Solving Markov Games with Model Uncertainty

Authors: Shaocong Ma, Ziyi Chen, Shaofeng Zou, Yi Zhou

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we develop a theoretical solution to Markov games with environment model uncertainty. Specifically, we propose a new and tractable notion of robust correlated equilibria for Markov games with environment model uncertainty. In particular, we prove that the robust correlated equilibrium has a simple modification structure, and its characterization of equilibria critically depends on the environment model uncertainty. Moreover, we propose the first fully-decentralized stochastic algorithm for computing such the robust correlated equilibrium. Our analysis proves that the algorithm achieves the polynomial episode complexity e O(SA2H5ϵ 2) for computing an approximate robust correlated equilibrium with ϵ accuracy.
Researcher Affiliation Academia Shaocong Ma EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA; Ziyi Chen EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA; Shaofeng Zou EMAIL Department of Electrical Engineering University at Buffalo, The State University of New York Buffalo, NY 14260, USA; Yi Zhou EMAIL Department of Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA
Pseudocode Yes Algorithm 1: Decentralized Robust V-Learning (j-th player)... Algorithm 2: Implement output policy ˆπk,h. (Algorithm 3 from Jin et al. (2022a))... Algorithm 3: Adversarial bandit algorithm (ADV_BANDIT)
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories.
Open Datasets No The paper focuses on theoretical solutions for Markov games with model uncertainty, using examples like KL divergence and R-contamination models, which are theoretical uncertainty sets. It does not mention any specific public or open datasets used for empirical evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments on datasets, therefore no dataset splits are provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware used for computation.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis, rather than implementation details. It does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical, presenting algorithms and their convergence analysis. It does not contain an experimental section or details regarding hyperparameters or training configurations.