reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback

Authors: Shishen Lin

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on diverse matrix game benchmarks demonstrate that COEBL not only achieves sublinear regret but also consistently outperforms classical bandit algorithms, including EXP3, the variant EXP3-IX, and UCB. These results highlight the potential of evolutionary bandit learning, particularly the efficacy of randomised optimism via evolutionary algorithms in gametheoretic settings. In this section, we present empirical results comparing the discussed algorithms. We conducted extensive experiments on three matrix games: the RPS game, the DIAGONAL game, and the BIGGERNUMBER game.
Researcher Affiliation	Academia	Shishen Lin School of Computer Science, University of Birmingham, Birmingham, United Kingdom EMAIL
Pseudocode	Yes	Algorithm 1 COEBL for matrix games
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We consider the classic matrix game benchmark: Rock Paper-Scissors games [Littman, 1994; O Donoghue et al., 2021], and its payoff matrix is defined as follows. DIAGONAL is a pseudo-Boolean maximin-benchmark on which Lehre and Lin [2024b] conducted runtime analysis of coevolutionary algorithms. BIGGERNUMBER is another challenging two-player zero-sum game proposed and analysed by [Zhang and Sandholm, 2024].
Dataset Splits	No	The paper describes experiments on matrix games which are not typically partitioned into training/test/validation splits in the traditional machine learning sense. The experiments involve iterations of game play and simulations, but no explicit data splitting strategy is mentioned.
Hardware Specification	No	The computations were performed using the University of Birmingham s Blue BEAR high performance computing (HPC) service. This mentions a service but lacks specific hardware details like GPU/CPU models.
Software Dependencies	No	The paper does not explicitly mention any software dependencies with specific version numbers.
Experiment Setup	Yes	For EXP3, we use the exploration rate γt = min{ pK log K/t, 1} and learning rate ηt = p2 log K/t K as suggested in [O Donoghue et al., 2021]. For the variant of EXP3-IX, we use the same settings ηt = t kη, βt = t kβ, εt = t kε where kη = 5/8, kβ = 3/8, kε = 1/8 as suggested in [Cai et al., 2023]. For COEBL, we set the mutation rate c = 2 for the RPS game and c = 8 for the rest of the games. There is no hyper-parameter needed for UCB. We compute the empirical mean of the regrets and the KL-divergence (or total variation distance), and present the 95% confidence intervals in the plots. We run 50 independent simulations (up to 3000 iterations) for each configuration (over 50 seeds).