reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems

Authors: Joshua Holder, Natasha Jaques, Mehran Mesbahi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical Experiments We first test REDA in a simple SAP setting to provide intuition about why it is able to outperform existing methods in the literature. Then, we scale it up, applying it to a complex satellite constellation task allocation environment with hundreds of satellites and tasks, showing the power and efficiency of this method. Figure 2: Performance over 5 runs of various algorithms in dictator environment... Figure 3: Performance over 5 runs of various algorithms in a realistic constellation environment...
Researcher Affiliation	Academia	Joshua Holder1, Natasha Jaques2, Mehran Mesbahi1 1Department of Aeronautics and Astronautics, University of Washington, Seattle, WA 98195 2Department of Computer Science, University of Washington, Seattle, WA 98195 EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: RL-Enabled Distributed Assignment (REDA) Given: state-dependent benefit function ˆβ : S Rn m 1: Initialize Q-network parameters θ, target Q-network parameters θ = θ 2: Initialize a replay buffer D 3: for episode e = 1, 2, ... do
Open Source Code	Yes	Code https://github.com/Rainlabuw/rl-enableddistributed-assignment
Open Datasets	No	No specific public datasets are mentioned, nor is a link or formal citation provided for the custom-generated satellite constellation environment described in the paper. The paper states: "We generate a constellation of 324 satellites evenly distributed around the Earth, with 450 randomly placed tasks simulating internet users."
Dataset Splits	No	The paper describes generating a simulation environment for experiments (a constellation of satellites and tasks) but does not specify explicit training, validation, or test splits for this data in a conventional manner. It refers to running simulations over time steps rather than predefined dataset splits.
Hardware Specification	No	The paper mentions "compute requirements" in the supplementary materials but does not provide specific hardware details (like GPU models, CPU types, or memory) in the main text. It states: "See the supplemental materials on ar Xiv for further details on hyperparameter selection, network architecture, and compute requirements."
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers in the main text. It only mentions general concepts like "DQN paradigm" and refers to other algorithms.
Experiment Setup	Yes	With probability ϵ: xk α(ˆβ(sk)) (act greedily w/r/t the current benefit matrix) ... ϵ is decayed to zero over 300k time steps. ... observations are limited to information on the top 10 closest tasks, the previous assignment xi k 1, and the power state pi, as well as information related to the nearest 10 satellites to satellite i in orbit. Similarly, in reality satellites can only complete a subset of the tasks at a given time. Thus, we limit the size of the action space to 11, the first 10 corresponding to an assignment to the top 10 closest tasks, with the remaining action corresponding to completing any other task.