Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems

Authors: Joshua Holder, Natasha Jaques, Mehran Mesbahi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical Experiments We first test REDA in a simple SAP setting to provide intuition about why it is able to outperform existing methods in the literature. Then, we scale it up, applying it to a complex satellite constellation task allocation environment with hundreds of satellites and tasks, showing the power and efficiency of this method. Figure 2: Performance over 5 runs of various algorithms in dictator environment... Figure 3: Performance over 5 runs of various algorithms in a realistic constellation environment...
Researcher Affiliation Academia Joshua Holder1, Natasha Jaques2, Mehran Mesbahi1 1Department of Aeronautics and Astronautics, University of Washington, Seattle, WA 98195 2Department of Computer Science, University of Washington, Seattle, WA 98195 EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: RL-Enabled Distributed Assignment (REDA) Given: state-dependent benefit function ˆβ : S Rn m 1: Initialize Q-network parameters θ, target Q-network parameters θ = θ 2: Initialize a replay buffer D 3: for episode e = 1, 2, ... do
Open Source Code Yes Code https://github.com/Rainlabuw/rl-enableddistributed-assignment
Open Datasets No No specific public datasets are mentioned, nor is a link or formal citation provided for the custom-generated satellite constellation environment described in the paper. The paper states: "We generate a constellation of 324 satellites evenly distributed around the Earth, with 450 randomly placed tasks simulating internet users."
Dataset Splits No The paper describes generating a simulation environment for experiments (a constellation of satellites and tasks) but does not specify explicit training, validation, or test splits for this data in a conventional manner. It refers to running simulations over time steps rather than predefined dataset splits.
Hardware Specification No The paper mentions "compute requirements" in the supplementary materials but does not provide specific hardware details (like GPU models, CPU types, or memory) in the main text. It states: "See the supplemental materials on ar Xiv for further details on hyperparameter selection, network architecture, and compute requirements."
Software Dependencies No The paper does not provide specific software dependencies with version numbers in the main text. It only mentions general concepts like "DQN paradigm" and refers to other algorithms.
Experiment Setup Yes With probability ϵ: xk α(ˆβ(sk)) (act greedily w/r/t the current benefit matrix) ... ϵ is decayed to zero over 300k time steps. ... observations are limited to information on the top 10 closest tasks, the previous assignment xi k 1, and the power state pi, as well as information related to the nearest 10 satellites to satellite i in orbit. Similarly, in reality satellites can only complete a subset of the tasks at a given time. Thus, we limit the size of the action space to 11, the first 10 corresponding to an assignment to the top 10 closest tasks, with the remaining action corresponding to completing any other task.