reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Authors: Johan Peralez, Aurélien Delage, Jacopo Castellini, Rafael F. Cunha, Jilles S. Dibangoye

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two-as well as many-agent domains from the literature against ϵ-optimal simultaneous-move solvers confirm the superiority of our novel approach. This section presents the results of our experiments, which were carried out to juxtapose the sequential planning approach with its simultaneous counterpart employed in many leading-edge global multi-agent planning and reinforcement learning algorithms, encompassing the utilization of o SARSA, cf. Algorithm 1, as a standard algorithmic scheme.
Researcher Affiliation	Academia	1Univ Lyon, INSA Lyon, Inria, CITI, EA3720, 69621 Villeurbanne, France 2Haute Ecole de Gestion de Gen eve, University of Applied Sciences and Arts Western, 1227 Carouge, Geneva, Switzerland 3Bernoulli Institute, University of Groningen, Nijenborgh 4, NL-9747AG, Groningen, Netherlands EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: o SARSA(ϵ). 1 Init a0:ℓ 1 blind policy, α ( ) 0, g . 2 foreach episode do 3 Init SOC s0 b0 and step τ0 0. 4 for τ = 0 to ℓ 1 do 5 Select ϵ-greedily asτ τ w.r.t. ατ+1. 6 Compute SOC sτ+1 T(sτ, asτ τ ). 7 if Accept Accept Accept(asτ τ , gτ+1, ατ+1(sτ+1)) then 8 a0:ℓ 1 a0:τ 1, asτ τ , aτ+1:ℓ 1 . 9 gτ+1 ατ+1(sτ+1). 11 for τ = τ0 to 0 do 12 Update ατ backward.
Open Source Code	Yes	The source code is available at https://git.lwp.rug.nl/ml-rug/osarsa-aaai-25.
Open Datasets	Yes	We have comprehensively assessed various algorithms using several two-agent benchmarks from academic literature, available at masplan.org. These benchmarks include mabc, recycling, grid3x3, boxpushing, mars, and tiger.
Dataset Splits	No	No specific dataset splits (e.g., train/test/validation percentages or counts) are provided. The paper describes experiments on various multi-agent domains with different planning horizons and uses an episodic reinforcement learning algorithm, which typically involves interaction with an environment rather than fixed dataset splits.
Hardware Specification	Yes	All experiments were run on an Ubuntu machine with 16 GB of available RAM and an 1.8 GHz processor, using only one core and a time limit of 1 hour.
Software Dependencies	No	The paper mentions using 'ILOG CPLEX Optimization Studio' but does not specify a version number. No other software dependencies are listed with specific version information.
Experiment Setup	Yes	Algorithm 1 proceeds by generating trajectories of SOCs instead of OCs, one trajectory per episode, guided by sequential ϵ-greedy exploration steps. We use a portfolio of heuristic policies to guide the selection of the next sequential action to enhance exploration. This portfolio includes the random policy, the underlying MDP policy, and the blind policy (Hauskrecht 2000). We introduce an acceptance rule based on Simulated Annealing (SA) (Rutenbar 1989) to avoid the algorithm being stuck in a local optimum. Therein, a modification of the current sequential policy is kept not only if the performance improves but also in the opposite case, with a probability depending on the loss and on a temperature coefficient decreasing with time.