Preference-CFR: Beyond Nash Equilibrium for Better Game Strategies
Authors: Qi Ju, Thomas Tellier, Meng Sun, Zhemei Fang, Yunfeng Luo
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments with Texas Hold em, Pref CFR successfully trained Aggressive and Loose Passive styles that not only match original CFRbased strategies in performance but also display clearly distinct behavioral patterns. |
| Researcher Affiliation | Collaboration | 1School of Artificial Intelligence and Automation, Huazhong University of Science and Technology 2National Key Laboratory of Science and Technology on Multispectral Information Processing 3GTOKing. Correspondence to: Qi Ju <EMAIL>, Thomas Tellier <EMAIL>, Zhemei Fang <EMAIL>. |
| Pseudocode | No | The paper only presents mathematical equations and descriptions of algorithms (CFR, Pref-CFR) in prose and formulas, without any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code can be found at Git Hub. |
| Open Datasets | Yes | Our experiments are conducted using Kuhn poker (Kuhn, 1950), Leduc poker (Shi & Littman, 2001) as well as two-player and three-player Texas Hold em poker. |
| Dataset Splits | No | The paper describes experiments in game environments (Kuhn poker, Leduc poker, Texas Hold em) which involve training AI agents through self-play or simulation. It does not provide specific training/test/validation dataset splits as would be typical for supervised learning tasks with static datasets. |
| Hardware Specification | Yes | Solutions were computed in under 10 minutes with a 24-core CPU; subgames included 35k states and the full game used for leaf estimates had 25M states. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | In training, we set δ(I, raise) = 5 and β = 0.05 at the first decision node of player 1. |