Best of Both Worlds: Regret Minimization versus Minimax Play

Authors: Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compare our Algorithm 2 for EFGs to the standard OMD algorithm with dilated KL (Kozuno et al., 2021) as well as to minimax play. Our evaluations confirm our theoretical findings, revealing that Algorithm 2 can achieve the best of both no-regret algorithms and minimax play.
Researcher Affiliation Collaboration 1ETH Zürich 2Google Research 3Aarhus University 4EPFL. Correspondence to: Adrian Müller <EMAIL>.
Pseudocode Yes Algorithm 1 Phased Aggression with Importance-Weighting
Open Source Code Yes We provide the code in the supplementary material.
Open Datasets Yes We consider Kuhn poker (Kuhn, 1950), which serves as a simple yet fundamental example of two-player zero-sum imperfect information EFGs. Kuhn poker is a common 3-card simplification of standard poker, where each player selects one card from the deck {Jack, Queen, King} without replacement and initially bets one unit.3 https://en.wikipedia.org/wiki/Kuhn_poker
Dataset Splits No The paper describes experiments in a game environment (Kuhn Poker) involving repeated play for T rounds, but does not provide specific training/test/validation dataset splits typically found in data-driven experiments.
Hardware Specification No The paper mentions running experiments but does not specify any hardware details such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions 'dilated KL divergence' as part of the algorithm but does not list any specific software libraries or their version numbers.
Experiment Setup Yes In all algorithms, we used the same learning fixed rates (η 1/T) and the (unbalanced) dilated KL divergence for fairness and simplicity. We consider two types of experiments: First, we run the three algorithms against each other... Second, we evaluate how well each algorithm allows Alice to exploit exploitable strategies... We repeat each experiment 5 times. T = 1000 rounds.