Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Authors: Shicong Cen, Yuting Wei, Yuejie Chi
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1 illustrates the performance of the proposed PU and OMWU methods for solving randomly generated entropy-regularized matrix games. It is evident that both algorithms converge linearly, and achieve faster convergence rates when the regularization parameter increases. Figure 2 illustrates the performance of Algorithm 3 for solving a random generated entropy-regularized Markov game with |A| = |B| = 20, |S| = 100 and γ = 0.99 with varying choices of Tmain, Tsub and τ. |
| Researcher Affiliation | Academia | Shicong Cen EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University; Yuting Wei EMAIL Department of Statistics and Data Science, The Wharton School University of Pennsylvania; Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1: The PU method; Algorithm 2: The OMWU method; Algorithm 3: Policy Extragradient Method applied to Value Iteration for Entropy-regularized Markov Game |
| Open Source Code | No | The paper does not explicitly state that source code is provided or offer a link to a code repository. The mention of 'License: CC-BY 4.0' refers to the license for the paper itself, not the code. |
| Open Datasets | No | The paper describes generating synthetic data for performance illustration (e.g., 'randomly generated entropy-regularized matrix games' and 'random generated entropy-regularized Markov game') but does not use or provide access to any external public datasets. |
| Dataset Splits | No | The paper uses randomly generated data for illustration and does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the performance illustrations (e.g., CPU, GPU models, memory, or cluster specifications). |
| Software Dependencies | No | The paper does not mention any specific software dependencies or versions (e.g., programming languages, libraries, or solvers with version numbers) used for the experiments. |
| Experiment Setup | Yes | Figure 1: The learning rates are fixed as η = 0.1. ... with the entropy regularization parameter τ = 0.01 ... at 1000-th iteration with different choices of τ. Figure 2: The learning rates of both players are fixed as η = 0.005. ... with varying choices of Tmain, Tsub and τ. Algorithm 3, Step 4: ...where the initialization is set as uniform distributions. |