JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

Authors: Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Marcus McAleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi... By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements... We evaluate the algorithm empirically using a We Chat mini program and achieve a Master level with a 99.41% win rate against human players. The algorithm s effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results.
Researcher Affiliation Collaboration Yang Li1, , Kun Xiong2, Yingping Zhang2, Jiangcheng Zhu2, Stephen Mcaleer3, Wei Pan1, Jun Wang4, Zonghong Dai2, , Yaodong Yang5, 1The University of Manchester, 2Huawei, 3Carnegie Mellon University, 4University College London, 5Peking University
Pseudocode Yes Algorithm 1: Algorithm for building the payoff matrix M Algorithm 2: Algorithm for Populationer
Open Source Code No Our project site is available at https://sites.google.com/view/jiangjun-site/.
Open Datasets No In this study, we delve into the intricate geometry of Xiangqi, leveraging a dataset comprising over 10,000 game records from human gameplay as the foundational basis for our investigation... To thoroughly analyze Xiangqi s geometry, we obtained a dataset consisting of over 10,000 records of real-world Xiangqi games, which were sourced from the Play Ok game platform1... 1www.playok.com
Dataset Splits Yes Deployment Time Stage Wins Ties Losses Total Win Rate Month 1 Training 717 11 8 736 97.42% Month 2 Training 724 0 17 741 97.71% Month 3 Training 462 0 3 465 99.35% Month 4-6 Evaluation 5089 3 27 5119 99.41% Table 2: Monthly statistics of the Jiang Jun mini-program over a six-month period are presented in this table. The data is divided into two stages: Training and Evaluation.
Hardware Specification Yes The training of the Jiang Jun algorithm to the "Master" level was facilitated by our proposed training framework that effectively utilizes the computational capabilities of up to 90 V100 GPUs on the Huawei Cloud Model Art platform... utilizing a total of 90 V100 GPUs. Specifically, 78 of these GPUs were allocated for the MCTS Actor, 4 GPUs were used for the Training, and 8 GPUs were dedicated to the Populationer... The execution of these experiments relied on the power of high-performance computing, specifically utilizing 40 V100 GPUs.
Software Dependencies No The paper mentions "Res Nets-based neural network" and "Res Net-18 architecture" for the neural network, and "Simplex method" for solving LP. However, it does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, Python 3.x, CUDA 11.x) that were used to implement these components.
Experiment Setup Yes l = (z v)2 απT log p + β θ 2, (5) where α, β are balance constants between 0 and 1, and θ 2 is the L2 weight regularization of Jiang Jun agent. ...The hyperparameters of the network and training are provided as follows. network filters: 192, network layers: 10, batch size: 2048, sample games: 500, c_puct : 1.5, saver step: 400, learning rate : [0.03, 0.01, 0.003, 0.001, 0.0003, 0.0001, 0.0003, 0.001, 0.003, 0.01], minimum games in one block: 5000, maximum training blocks: 100, minimum training blocks: 3, number of the process: 10.