Preference-CFR: Beyond Nash Equilibrium for Better Game Strategies

Authors: Qi Ju, Thomas Tellier, Meng Sun, Zhemei Fang, Yunfeng Luo

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments with Texas Hold em, Pref CFR successfully trained Aggressive and Loose Passive styles that not only match original CFRbased strategies in performance but also display clearly distinct behavioral patterns.
Researcher Affiliation Collaboration 1School of Artificial Intelligence and Automation, Huazhong University of Science and Technology 2National Key Laboratory of Science and Technology on Multispectral Information Processing 3GTOKing. Correspondence to: Qi Ju <EMAIL>, Thomas Tellier <EMAIL>, Zhemei Fang <EMAIL>.
Pseudocode No The paper only presents mathematical equations and descriptions of algorithms (CFR, Pref-CFR) in prose and formulas, without any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code can be found at Git Hub.
Open Datasets Yes Our experiments are conducted using Kuhn poker (Kuhn, 1950), Leduc poker (Shi & Littman, 2001) as well as two-player and three-player Texas Hold em poker.
Dataset Splits No The paper describes experiments in game environments (Kuhn poker, Leduc poker, Texas Hold em) which involve training AI agents through self-play or simulation. It does not provide specific training/test/validation dataset splits as would be typical for supervised learning tasks with static datasets.
Hardware Specification Yes Solutions were computed in under 10 minutes with a 24-core CPU; subgames included 35k states and the full game used for leaf estimates had 25M states.
Software Dependencies No The paper does not explicitly mention any specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes In training, we set δ(I, raise) = 5 and β = 0.05 at the first decision node of player 1.