Approximate State Abstraction for Markov Games
Authors: Hiroki Ishibashi, Kenshi Abe, Atsushi Iwasaki
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results. ... This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021). ... We first compute the number of states in the abstract Markov soccer game for different of ϵ in Figure 2. ... Figure 3 illustrates the duality gap in the number of learning iterations where xand y-axes represent learning iterations and the gap, respectively, varying ϵ. |
| Researcher Affiliation | Collaboration | Hiroki Ishibashi1, Kenshi Abe1, 2, Atsushi Iwasaki1 1The University of Electro-Communications 2Cyber Agent EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 illustrates the procedure with finite T iterations. Algorithm 1: Minimax Q-learning |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021). |
| Dataset Splits | No | The paper uses the Markov Soccer environment, which is a simulation where agents learn, rather than a pre-collected dataset requiring explicit train/test/validation splits. The experimental setup describes learning iterations and parameters but not dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions using "Minimax Q-learning" as an algorithm but does not specify any software names with version numbers, libraries, or frameworks used for its implementation. |
| Experiment Setup | Yes | We here assume that the total number of learning iterations T is 1,000,000, the discount factor γ is 0.9, and the learning rate αt is set to 10 2 T t for learning iterations t 0. |