reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Double Oracle

Authors: Le Cong Dinh, Stephen Marcus McAleer, Zheng Tian, Nicolas Perez-Nieves, Oliver Slumbers, David Henry Mguni, Jun Wang, Haitham Bou Ammar, Yaodong Yang

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our algorithm on tens of games including random matrix games, real-world matrix games (Czarnecki et al., 2020), and Kuhn and Leduc Poker. Results show that in almost all games, ODO outperforms both DO and PSRO variants (Lanctot et al., 2017; Mc Aleer et al., 2020), and the online learning baseline: MWU (Freund & Schapire, 1999) in terms of exploitability (i.e., distance to an NE) and average payoffs against different types of strategic adversaries.
Researcher Affiliation	Collaboration	Le Cong Dinh EMAIL University of Southampton Stephen Mc Aleer EMAIL University of California, Irvine Zheng Tian EMAIL Shanghai Tech University Nicolas Perez-Nieves EMAIL Imperial College London Oliver Slumbers EMAIL University College London David Henry Mguni EMAIL Huawei R&D UK Jun Wang EMAIL Huawei R&D UK Haitham Bou Ammar EMAIL Huawei R&D UK Yaodong Yang EMAIL Institute for AI, Peking University
Pseudocode	Yes	Algorithm 1: Online Single Oracle Algorithm
Open Source Code	Yes	All the codes for the experiments are attached as supplementary.
Open Datasets	Yes	We test our algorithm on tens of games including random matrix games, real-world matrix games (Czarnecki et al., 2020), and Kuhn and Leduc Poker.
Dataset Splits	No	The paper mentions using 'random matrix games', 'real-world matrix games', 'Kuhn and Leduc Poker', but does not specify any training, validation, or test splits for these games. The experiments are described in terms of running simulations or playing the games multiple times ('20 seeds for each setting').
Hardware Specification	No	The paper does not provide specific hardware details such as CPU models, GPU models, or memory specifications used for running the experiments. It only mentions that hyperparameters are in Appendix D, which does not contain hardware information.
Software Dependencies	No	The paper mentions tools like 'Open Spiel (Lanctot et al., 2019)' and 'MALib (Zhou et al., 2021)' but does not specify their version numbers. It also refers to algorithms like 'MWU', 'CFR', and 'XFP' without providing specific software implementation versions.
Experiment Setup	Yes	All hyperparameter setting can be found in Appendix D. Learning Rate of MWU. When facing a MWU adversary, we set the learning rate of the adversary to be optimal (i.e., µt = p 8 log(A)/T). That is, we eliminate the case when one can exploit MWU adversary with wrong learning rate. For DO-MWU algorithm, we allow the algorithm to update to next subgame when it achieves ϵ-Nash Equilibrium with ϵ = 0.01. Therefore, we optimally choose the learning rate of MWU in a subgame to be µt = 0.01 so that the DO-MWU can achieve 0.01-Nash Equilibrium in subgames in at most O(log(k)/ϵ2) steps. For a fair comparison with DO methods, we also choose the same learning rate µt = 0.01 for our ODO in the experiments.