Approximate State Abstraction for Markov Games

Authors: Hiroki Ishibashi, Kenshi Abe, Atsushi Iwasaki

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results. ... This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021). ... We first compute the number of states in the abstract Markov soccer game for different of ϵ in Figure 2. ... Figure 3 illustrates the duality gap in the number of learning iterations where xand y-axes represent learning iterations and the gap, respectively, varying ϵ.
Researcher Affiliation Collaboration Hiroki Ishibashi1, Kenshi Abe1, 2, Atsushi Iwasaki1 1The University of Electro-Communications 2Cyber Agent EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 illustrates the procedure with finite T iterations. Algorithm 1: Minimax Q-learning
Open Source Code No The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository.
Open Datasets Yes This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021).
Dataset Splits No The paper uses the Markov Soccer environment, which is a simulation where agents learn, rather than a pre-collected dataset requiring explicit train/test/validation splits. The experimental setup describes learning iterations and parameters but not dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper mentions using "Minimax Q-learning" as an algorithm but does not specify any software names with version numbers, libraries, or frameworks used for its implementation.
Experiment Setup Yes We here assume that the total number of learning iterations T is 1,000,000, the discount factor γ is 0.9, and the learning rate αt is set to 10 2 T t for learning iterations t 0.