Best of Both Worlds: Regret Minimization versus Minimax Play
Authors: Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally compare our Algorithm 2 for EFGs to the standard OMD algorithm with dilated KL (Kozuno et al., 2021) as well as to minimax play. Our evaluations confirm our theoretical findings, revealing that Algorithm 2 can achieve the best of both no-regret algorithms and minimax play. |
| Researcher Affiliation | Collaboration | 1ETH Zürich 2Google Research 3Aarhus University 4EPFL. Correspondence to: Adrian Müller <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Phased Aggression with Importance-Weighting |
| Open Source Code | Yes | We provide the code in the supplementary material. |
| Open Datasets | Yes | We consider Kuhn poker (Kuhn, 1950), which serves as a simple yet fundamental example of two-player zero-sum imperfect information EFGs. Kuhn poker is a common 3-card simplification of standard poker, where each player selects one card from the deck {Jack, Queen, King} without replacement and initially bets one unit.3 https://en.wikipedia.org/wiki/Kuhn_poker |
| Dataset Splits | No | The paper describes experiments in a game environment (Kuhn Poker) involving repeated play for T rounds, but does not provide specific training/test/validation dataset splits typically found in data-driven experiments. |
| Hardware Specification | No | The paper mentions running experiments but does not specify any hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions 'dilated KL divergence' as part of the algorithm but does not list any specific software libraries or their version numbers. |
| Experiment Setup | Yes | In all algorithms, we used the same learning fixed rates (η 1/T) and the (unbalanced) dilated KL divergence for fairness and simplicity. We consider two types of experiments: First, we run the three algorithms against each other... Second, we evaluate how well each algorithm allows Alice to exploit exploitable strategies... We repeat each experiment 5 times. T = 1000 rounds. |