A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Authors: Mingyang Liu, Gabriele Farina, Asuman Ozdaglar
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we apply QFR in 4-Sided Liar s Dice, Leduc Poker (Southey et al., 2005), Kuhn Poker (Kuhn, 1950), and 2 2 Abrupt Dark Hex. The experimental result of Algorithm 1 is presented in Figure 1. Figure 1 shows that QFR outperforms outcome-sampling CFR, CFR+, and BOMD in all games. |
| Researcher Affiliation | Academia | Mingyang Liu, Gabriele Farina & Asuman Ozdaglar LIDS, EECS Massachusetts Institute of Technology Cambridge, MA 02139, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Q-Function based Regret minimization (QFR) |
| Open Source Code | Yes | The code of QFR and baselines for tabular games can be found in Lite EFG3 (Liu et al., 2024). 3https://github.com/liumy2010/Lite EFG/tree/main/Lite EFG/baselines |
| Open Datasets | Yes | In the experiments, we apply QFR in 4-Sided Liar s Dice, Leduc Poker (Southey et al., 2005), Kuhn Poker (Kuhn, 1950), and 2 2 Abrupt Dark Hex. The code is based on Lite EFG (Liu et al., 2024) with game environments implemented by Open Spiel (Lanctot et al., 2019). |
| Dataset Splits | No | The paper uses extensive-form games (e.g., Leduc Poker, Kuhn Poker) as environments for experimentation. These are dynamic environments where agents interact, and performance is measured through metrics like 'exploitability' over iterations rather than on fixed training, validation, and test splits. The paper does not describe any specific dataset splits in the conventional sense for reproducibility. |
| Hardware Specification | Yes | Figure 1 and Figure 2 are conducted on 240 cores of Intel Xeon Platinum 8260 and Figure 3 is conducted on Intel(R) Xeon Gold 6248 with NVidia Volta V100. |
| Software Dependencies | No | The code is based on Lite EFG (Liu et al., 2024) with game environments implemented by Open Spiel (Lanctot et al., 2019). Our implementation of QFR is based on PPO (Schulman et al., 2017) in Clean RL (Huang et al., 2022). While these are specific software components and frameworks, no version numbers are provided for reproducibility. |
| Experiment Setup | Yes | In order to pick hyperparameters, we performed a grid-search for QFR and MMD on learning rate η, regularization τ, perturbation γ, and the regularizer is either negative entropy or Euclidean distance. For Balanced OMD (BOMD) (Bai et al., 2022) and Balanced FTRL (Fiegel et al., 2023), we applied grid search to the learning rate η and fixed the exploration rate (IX parameter) to η 20 as suggested in Fiegel et al. (2023). |