Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback
Authors: Shishen Lin
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on diverse matrix game benchmarks demonstrate that COEBL not only achieves sublinear regret but also consistently outperforms classical bandit algorithms, including EXP3, the variant EXP3-IX, and UCB. These results highlight the potential of evolutionary bandit learning, particularly the efficacy of randomised optimism via evolutionary algorithms in gametheoretic settings. In this section, we present empirical results comparing the discussed algorithms. We conducted extensive experiments on three matrix games: the RPS game, the DIAGONAL game, and the BIGGERNUMBER game. |
| Researcher Affiliation | Academia | Shishen Lin School of Computer Science, University of Birmingham, Birmingham, United Kingdom EMAIL |
| Pseudocode | Yes | Algorithm 1 COEBL for matrix games |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We consider the classic matrix game benchmark: Rock Paper-Scissors games [Littman, 1994; O Donoghue et al., 2021], and its payoff matrix is defined as follows. DIAGONAL is a pseudo-Boolean maximin-benchmark on which Lehre and Lin [2024b] conducted runtime analysis of coevolutionary algorithms. BIGGERNUMBER is another challenging two-player zero-sum game proposed and analysed by [Zhang and Sandholm, 2024]. |
| Dataset Splits | No | The paper describes experiments on matrix games which are not typically partitioned into training/test/validation splits in the traditional machine learning sense. The experiments involve iterations of game play and simulations, but no explicit data splitting strategy is mentioned. |
| Hardware Specification | No | The computations were performed using the University of Birmingham s Blue BEAR high performance computing (HPC) service. This mentions a service but lacks specific hardware details like GPU/CPU models. |
| Software Dependencies | No | The paper does not explicitly mention any software dependencies with specific version numbers. |
| Experiment Setup | Yes | For EXP3, we use the exploration rate γt = min{ pK log K/t, 1} and learning rate ηt = p2 log K/t K as suggested in [O Donoghue et al., 2021]. For the variant of EXP3-IX, we use the same settings ηt = t kη, βt = t kβ, εt = t kε where kη = 5/8, kβ = 3/8, kε = 1/8 as suggested in [Cai et al., 2023]. For COEBL, we set the mutation rate c = 2 for the RPS game and c = 8 for the rest of the games. There is no hyper-parameter needed for UCB. We compute the empirical mean of the regrets and the KL-divergence (or total variation distance), and present the 95% confidence intervals in the plots. We run 50 independent simulations (up to 3000 iterations) for each configuration (over 50 seeds). |