Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Authors: Shicong Cen, Yuting Wei, Yuejie Chi

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1 illustrates the performance of the proposed PU and OMWU methods for solving randomly generated entropy-regularized matrix games. It is evident that both algorithms converge linearly, and achieve faster convergence rates when the regularization parameter increases. Figure 2 illustrates the performance of Algorithm 3 for solving a random generated entropy-regularized Markov game with |A| = |B| = 20, |S| = 100 and γ = 0.99 with varying choices of Tmain, Tsub and τ.
Researcher Affiliation Academia Shicong Cen EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University; Yuting Wei EMAIL Department of Statistics and Data Science, The Wharton School University of Pennsylvania; Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University
Pseudocode Yes Algorithm 1: The PU method; Algorithm 2: The OMWU method; Algorithm 3: Policy Extragradient Method applied to Value Iteration for Entropy-regularized Markov Game
Open Source Code No The paper does not explicitly state that source code is provided or offer a link to a code repository. The mention of 'License: CC-BY 4.0' refers to the license for the paper itself, not the code.
Open Datasets No The paper describes generating synthetic data for performance illustration (e.g., 'randomly generated entropy-regularized matrix games' and 'random generated entropy-regularized Markov game') but does not use or provide access to any external public datasets.
Dataset Splits No The paper uses randomly generated data for illustration and does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the performance illustrations (e.g., CPU, GPU models, memory, or cluster specifications).
Software Dependencies No The paper does not mention any specific software dependencies or versions (e.g., programming languages, libraries, or solvers with version numbers) used for the experiments.
Experiment Setup Yes Figure 1: The learning rates are fixed as η = 0.1. ... with the entropy regularization parameter τ = 0.01 ... at 1000-th iteration with different choices of τ. Figure 2: The learning rates of both players are fixed as η = 0.005. ... with varying choices of Tmain, Tsub and τ. Algorithm 3, Step 4: ...where the initialization is set as uniform distributions.