reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximate State Abstraction for Markov Games

Authors: Hiroki Ishibashi, Kenshi Abe, Atsushi Iwasaki

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results. ... This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021). ... We first compute the number of states in the abstract Markov soccer game for different of ϵ in Figure 2. ... Figure 3 illustrates the duality gap in the number of learning iterations where xand y-axes represent learning iterations and the gap, respectively, varying ϵ.
Researcher Affiliation	Collaboration	Hiroki Ishibashi1, Kenshi Abe1, 2, Atsushi Iwasaki1 1The University of Electro-Communications 2Cyber Agent EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 illustrates the procedure with finite T iterations. Algorithm 1: Minimax Q-learning
Open Source Code	No	The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	This section demonstrates our state abstraction developed so far in Markov Soccer (Littman 1994; Abe and Kaneko 2021).
Dataset Splits	No	The paper uses the Markov Soccer environment, which is a simulation where agents learn, rather than a pre-collected dataset requiring explicit train/test/validation splits. The experimental setup describes learning iterations and parameters but not dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions using "Minimax Q-learning" as an algorithm but does not specify any software names with version numbers, libraries, or frameworks used for its implementation.
Experiment Setup	Yes	We here assume that the total number of learning iterations T is 1,000,000, the discount factor γ is 0.9, and the learning rate αt is set to 10 2 T t for learning iterations t 0.