reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Global Nash Equilibrium in Team Competitive Games with Generalized Fictitious Cross-Play

Authors: Zelai Xu, Chao Yu, Yancheng Liang, Yi Wu, Yu Wang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate GFXP in matrix games and gridworld domains where GFXP achieves the lowest exploitabilities. We further conduct experiments in a challenging football game where GFXP defeats SOTA models with over 94% win rate.
Researcher Affiliation	Academia	Zelai Xu EMAIL Department of Electronic Engineering Tsinghua University Beijing, 100084, China; Chao Yu EMAIL Department of Electronic Engineering Tsinghua University Beijing, 100084, China; Yancheng Liang EMAIL School of Computer Science and Engineering University of Washington Seattle, WA 98195, USA; Yi Wu EMAIL Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, 100084, China; Yu Wang EMAIL Department of Electronic Engineering Tsinghua University Beijing, 100084, China
Pseudocode	Yes	Algorithm 1: Self-Play (SP); Algorithm 2: Policy-Space Response Oracles (PSRO); Algorithm 3: Fictitious Cross-Play (FXP)
Open Source Code	No	The paper does not explicitly state that the authors' code for GFXP is publicly available or provide a link to a code repository. It mentions that Tikick's model is released and that PSRO w. BD&RD never released their code or model, but not for the current work.
Open Datasets	Yes	Then we use MAPPO (Yu et al., 2021) as an approximate BR oracle and consider a gridworld environment MAgent Battle (Zheng et al., 2018). Finally, with large-scale training, we use GFXP to solve the challenging 11-vs-11 multi-agent full game in the Google Research Football (GRF) (Kurach et al., 2020) environment.
Dataset Splits	No	The paper uses simulation environments like MAgent Battle and Google Research Football. While it describes scenarios and game configurations (e.g., 3-vs-3 battle, 11-vs-11 full-game task), it does not provide specific dataset splits (e.g., training/test/validation percentages or counts) for a pre-collected dataset, as data is generated dynamically through simulation.
Hardware Specification	Yes	Each algorithm is trained on a 128-core CPU server for 30k steps. All algorithms use a recurrent policy and are trained on a single 4090 GPU for 100M environment frames. The experiments are trained on a single 4090 GPU.
Software Dependencies	No	The paper mentions software components like 'SGD optimizer', 'policy gradient', 'MAPPO', 'Adam', and 'PPO' but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	For SP and FSP, we simply train the single agent for 30k steps. For PSRO with and without reset, we run 30 iterations and the BR policy in each iteration is trained for 1k steps. For GFXP, we run 15 iterations and the main policy and counter policy in each iteration are both trained for 1k steps. The self-play probability η is set to 0.2 and decays exponentially to 0 with a factor of 0.97. All training hyperparameters for diﬀerent algorithms and BR learning are the same and listed in Table 4. All training hyperparameters for GFXP in GRF are listed in Table 8.