Decentralized Online Learning by Selfish Agents in Coalition Formation
Authors: Saar Cohen, Noa Agmon
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we introduce online learning in coalition formation through the lens of distributed decisionmaking, where self-interested agents operate without global coordination or information sharing, and learn only from their own experience. Under our selfish perspective, each agent seeks to maximize her own utility. Thus, we analyze the system in terms of Nash stability, where no agent can improve her utility by unilaterally deviating. We devise a sample-efficient decentralized algorithm for selfish agents that minimize their Nash regret, yielding approximately Nash stable solutions. In our algorithm, each agent uses only one utility feedback per round to update her strategy, but our algorithm still has Nash regret and sample complexity bounds that are optimal up to logarithmic factors. |
| Researcher Affiliation | Academia | Saar Cohen and Noa Agmon Department of Computer Science, Bar-Ilan University, Israel EMAIL, EMAIL |
| Pseudocode | Yes | In this section, we devise a sample-efficient decentralized online learning algorithm for DOL-ASHGs (Algorithm 1), termed as Decentralized One-Sample Frank-Wolfe (D1S-FW), that requires only one sample per round to update the optimization variables and obtains sublinear Nash regret that is also polynomial in the number of agents n. Algorithm 1 D1S-FW Input: T rounds; n agents; Step sizes ρt, ηt, µt (0, 1) t. 1: Initialize an arbitrary initial policy φ1 i for each agent i. 2: for each time t = 1, . . . , T do 3: Each agent i joins a coalition by sampling xt i φt i. 4: Each agent i receives a utility vt i from her coalition. 5: for each agent i N individually do 6: Compute b t iΦ using (5). 7: if t = 1 then 8: Set the gradient estimation as g1 i := b 1 i Φ. 9: else 10: Compute b t i = b t iΦ b t 1 i Φ. 11: Compute the gradient estimator gt i using (2). 12: Calculate ˆϕt i = arg maxϕ Sn ϕ, gt i . 13: Compute πt i using (6) (See Remark 4). 14: Set φt+1 i = (1 ρt)[φt i + ηt(ˆϕt i φt i)] + ρtπt i. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. It only mentions that "All omitted proofs can be found in the supplementary materials [Cohen and Agmon, 2025a]." |
| Open Datasets | No | The paper introduces a new theoretical model for online learning in coalition formation where agents learn from repeated feedback from their interactions. It does not mention the use of any pre-existing or publicly available datasets for experiments. The learning process described is based on agents' self-generated experiences. |
| Dataset Splits | No | The paper describes a theoretical model and algorithm for decentralized online learning in coalition formation. It does not conduct experiments on a dataset, therefore, there is no mention of dataset splits (training/test/validation). |
| Hardware Specification | No | The paper focuses on theoretical contributions, including algorithm design and proof of Nash regret and sample complexity bounds. There is no mention of any experimental evaluation or specific hardware used to run such experiments. |
| Software Dependencies | No | The paper presents a theoretical algorithm and its mathematical properties. It does not describe any implementation details or mention specific software, libraries, or their version numbers that would be required to reproduce the work. |
| Experiment Setup | No | The paper describes a theoretical algorithm, D1S-FW, and analyzes its Nash regret and sample complexity bounds. While it refers to 'Step sizes ρt, ηt, µt' as parameters for the algorithm's theoretical analysis, these are not presented as part of an empirical experimental setup with concrete values used for training or evaluation. The paper does not contain any sections detailing an experimental setup, hyperparameters, or training configurations. |