reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Authors: Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we apply QFR in 4-Sided Liar s Dice, Leduc Poker (Southey et al., 2005), Kuhn Poker (Kuhn, 1950), and 2 2 Abrupt Dark Hex. The experimental result of Algorithm 1 is presented in Figure 1. Figure 1 shows that QFR outperforms outcome-sampling CFR, CFR+, and BOMD in all games.
Researcher Affiliation	Academia	Mingyang Liu, Gabriele Farina & Asuman Ozdaglar LIDS, EECS Massachusetts Institute of Technology Cambridge, MA 02139, USA EMAIL
Pseudocode	Yes	Algorithm 1 Q-Function based Regret minimization (QFR)
Open Source Code	Yes	The code of QFR and baselines for tabular games can be found in Lite EFG3 (Liu et al., 2024). 3https://github.com/liumy2010/Lite EFG/tree/main/Lite EFG/baselines
Open Datasets	Yes	In the experiments, we apply QFR in 4-Sided Liar s Dice, Leduc Poker (Southey et al., 2005), Kuhn Poker (Kuhn, 1950), and 2 2 Abrupt Dark Hex. The code is based on Lite EFG (Liu et al., 2024) with game environments implemented by Open Spiel (Lanctot et al., 2019).
Dataset Splits	No	The paper uses extensive-form games (e.g., Leduc Poker, Kuhn Poker) as environments for experimentation. These are dynamic environments where agents interact, and performance is measured through metrics like 'exploitability' over iterations rather than on fixed training, validation, and test splits. The paper does not describe any specific dataset splits in the conventional sense for reproducibility.
Hardware Specification	Yes	Figure 1 and Figure 2 are conducted on 240 cores of Intel Xeon Platinum 8260 and Figure 3 is conducted on Intel(R) Xeon Gold 6248 with NVidia Volta V100.
Software Dependencies	No	The code is based on Lite EFG (Liu et al., 2024) with game environments implemented by Open Spiel (Lanctot et al., 2019). Our implementation of QFR is based on PPO (Schulman et al., 2017) in Clean RL (Huang et al., 2022). While these are specific software components and frameworks, no version numbers are provided for reproducibility.
Experiment Setup	Yes	In order to pick hyperparameters, we performed a grid-search for QFR and MMD on learning rate η, regularization τ, perturbation γ, and the regularizer is either negative entropy or Euclidean distance. For Balanced OMD (BOMD) (Bai et al., 2022) and Balanced FTRL (Fiegel et al., 2023), we applied grid search to the learning rate η and fixed the exploration rate (IX parameter) to η 20 as suggested in Fiegel et al. (2023).