reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Solving Zero-Sum Convex Markov Games

Authors: Fivos Kalogiannis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ian Gemp, Georgios Piliouras

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Numerical Results We demonstrate Algorithms 1 and 2 on an iterated version of rock-paper-scissors-dummy where each player remembers the actions selected in the previous round. Hence, the previous joint action constitutes the state in the Markov game. The dummy action is dominated by all other actions such that the Nash equilibrium of the stage game is uniform across rock, paper, and scissors with zero mass on dummy. We set the step-sizes τx = τy = 0.1 and vary the regularization coefficient µ to demonstrate its effect in biasing convergence. Figure 1. Exploitability decays towards a small, but positive value corresponding to the bias introduced by the regularization coefficient µ. Results are averaged over 100 trials, each running the algorithm with a different randomly initialized policy profile. The leftmost plot reports results for Algorithm 2; the right two report results for Algorithm 1 with Tin = 10 and 100 respectively.
Researcher Affiliation	Collaboration	1Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA 2Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA and Archimedes, Athena Reaserch Center, Greece 3Google Deep Mind, London, UK. Correspondence to: Fivos Kalogiannis <EMAIL>.
Pseudocode	Yes	Algorithm 1 Nest-PG: Nested Policy Gradient input (x0, y0), step-sizes τx, τy, regul. coeff. µ 0 for t = 1 to Tout do yt,0 yt 1 for s = 1 to Tin do yt,s ΠY ys 1 + τy ˆ y U µ (xt 1, yt,s 1) yt yt,Tin end for xt ΠX xt 1 τx ˆ x U (xt 1, yt) end for Pick t {1, . . . , T} of the best iterate. output (xt , yt +1).
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code, nor does it provide a link to a code repository in the main text or supplementary sections mentioned.
Open Datasets	No	We demonstrate Algorithms 1 and 2 on an iterated version of rock-paper-scissors-dummy where each player remembers the actions selected in the previous round. Hence, the previous joint action constitutes the state in the Markov game. The dummy action is dominated by all other actions such that the Nash equilibrium of the stage game is uniform across rock, paper, and scissors with zero mass on dummy.
Dataset Splits	No	The paper describes experiments on an iterated version of rock-paper-scissors-dummy, which is a simulated game environment. It does not use external datasets that would require explicit training/test/validation splits.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware specifications used to run the numerical experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers for reproducibility.
Experiment Setup	Yes	We set the step-sizes τx = τy = 0.1 and vary the regularization coefficient µ to demonstrate its effect in biasing convergence.