reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decentralized Online Learning by Selfish Agents in Coalition Formation

Authors: Saar Cohen, Noa Agmon

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we introduce online learning in coalition formation through the lens of distributed decisionmaking, where self-interested agents operate without global coordination or information sharing, and learn only from their own experience. Under our selfish perspective, each agent seeks to maximize her own utility. Thus, we analyze the system in terms of Nash stability, where no agent can improve her utility by unilaterally deviating. We devise a sample-efficient decentralized algorithm for selfish agents that minimize their Nash regret, yielding approximately Nash stable solutions. In our algorithm, each agent uses only one utility feedback per round to update her strategy, but our algorithm still has Nash regret and sample complexity bounds that are optimal up to logarithmic factors.
Researcher Affiliation	Academia	Saar Cohen and Noa Agmon Department of Computer Science, Bar-Ilan University, Israel EMAIL, EMAIL
Pseudocode	Yes	In this section, we devise a sample-efficient decentralized online learning algorithm for DOL-ASHGs (Algorithm 1), termed as Decentralized One-Sample Frank-Wolfe (D1S-FW), that requires only one sample per round to update the optimization variables and obtains sublinear Nash regret that is also polynomial in the number of agents n. Algorithm 1 D1S-FW Input: T rounds; n agents; Step sizes ρt, ηt, µt (0, 1) t. 1: Initialize an arbitrary initial policy φ1 i for each agent i. 2: for each time t = 1, . . . , T do 3: Each agent i joins a coalition by sampling xt i φt i. 4: Each agent i receives a utility vt i from her coalition. 5: for each agent i N individually do 6: Compute b t iΦ using (5). 7: if t = 1 then 8: Set the gradient estimation as g1 i := b 1 i Φ. 9: else 10: Compute b t i = b t iΦ b t 1 i Φ. 11: Compute the gradient estimator gt i using (2). 12: Calculate ˆϕt i = arg maxϕ Sn ϕ, gt i . 13: Compute πt i using (6) (See Remark 4). 14: Set φt+1 i = (1 ρt)[φt i + ηt(ˆϕt i φt i)] + ρtπt i.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. It only mentions that "All omitted proofs can be found in the supplementary materials [Cohen and Agmon, 2025a]."
Open Datasets	No	The paper introduces a new theoretical model for online learning in coalition formation where agents learn from repeated feedback from their interactions. It does not mention the use of any pre-existing or publicly available datasets for experiments. The learning process described is based on agents' self-generated experiences.
Dataset Splits	No	The paper describes a theoretical model and algorithm for decentralized online learning in coalition formation. It does not conduct experiments on a dataset, therefore, there is no mention of dataset splits (training/test/validation).
Hardware Specification	No	The paper focuses on theoretical contributions, including algorithm design and proof of Nash regret and sample complexity bounds. There is no mention of any experimental evaluation or specific hardware used to run such experiments.
Software Dependencies	No	The paper presents a theoretical algorithm and its mathematical properties. It does not describe any implementation details or mention specific software, libraries, or their version numbers that would be required to reproduce the work.
Experiment Setup	No	The paper describes a theoretical algorithm, D1S-FW, and analyzes its Nash regret and sample complexity bounds. While it refers to 'Step sizes ρt, ηt, µt' as parameters for the algorithm's theoretical analysis, these are not presented as part of an empirical experimental setup with concrete values used for training or evaluation. The paper does not contain any sections detailing an experimental setup, hyperparameters, or training configurations.