Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning
Authors: Laixi Shi, Jingchu Gai, Eric Mazumdar, Yuejie Chi, Adam Wierman
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent s uncertainty set is shaped by both the environment and the integrated behavior of other agents. We first establish the wellposedness of this class of RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs, regardless of the uncertainty set formulation. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA 2Machine Learning Department, Carnegie Mellon University, PA, USA 3Department of Computing Mathematical Sciences, California Institute of Technology, CA, USA 4Department of Statistics and Data Science, Yale University, CT, USA. |
| Pseudocode | Yes | Algorithm 1 N-sample estimation πh = {πj,h}j [n], i, h Algorithm 2 Robust-Q-FTRL |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. It does not mention code in supplementary materials or appendices for the described methodology. |
| Open Datasets | No | The paper is theoretical in nature, focusing on algorithm design and theoretical guarantees for Robust Multi-Agent Reinforcement Learning. It does not conduct experiments on specific datasets. While it mentions a 'generative model' for data collection, this is a theoretical construct for sampling rather than a concrete, publicly available dataset used for empirical validation. |
| Dataset Splits | No | This paper is theoretical and does not perform experiments using specific datasets that would require splitting into training, validation, and test sets. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe any experimental setup or specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical, presenting algorithms and their theoretical properties. It does not mention any specific software packages, libraries, or their version numbers that would be required to replicate an implementation of the described methods. |
| Experiment Setup | No | The paper is theoretical and describes an algorithm with theoretical guarantees, but it does not detail an experimental setup with specific hyperparameter values (e.g., learning rates, batch sizes, epochs) or other training configurations that would be used in an empirical study. The mentioned learning rates {αk} and {ηk+1}, and parameters like K (number of iterations) and N (number of samples) are abstract components within the theoretical algorithm description. |