Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

Authors: Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Yu, Tianlong Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across six benchmarks demonstrate that Agent Prune (I) achieves comparable results as state-of-the-art topologies at merely $5.6 cost compared to their $43.7, (II) integrates seamlessly into existing multi-agent frameworks with 28.1% 72.8% token reduction, and (III) successfully defend against two types of agent-based adversarial attacks with 3.5% 10.8% performance boost.
Researcher Affiliation Academia Guibin Zhang1 , Yanwei Yue1 , Zhixun Li2 , Sukwon Yun3, Guancheng Wan4, Kun Wang5 , Dawei Cheng1,6, Jeffrey Xu Yu2, Tianlong Chen3 1Tongji University 2The Chinese University of Hong Kong 3University of North Carolina at Chapel Hill 4Wuhan University 5Nanyang Technological University 6Shanghai AI Laboratory
Pseudocode Yes Algorithm 1: Execution pipeline of LLM-MA systems from spatial-temporal graph perspective
Open Source Code Yes The source code is available at https://github.com/yanweiyue/Agent Prune.
Open Datasets Yes In our experiments, we test the performance of Agent Prune on three types of reasoning tasks and the corresponding logically challenging benchmarks: (1) General Reasoning: We opt for MMLU (Hendrycks et al., 2021) dataset; (2) Mathematical Reasoning: We select GSM8K (Cobbe et al., 2021), Multi Arith (Roy & Roth, 2016), SVAMP (Patel et al., 2021) and AQu A (Ling et al., 2017) to verify the mathematical reasoning capacity; (3) Code Generation: We use the Human Eval (Chen et al., 2021a) to test the function-level code generation ability.
Dataset Splits Yes For multi-query settings, we vary Q {5, 10, 20} and fix M = 10. Given a benchmark consisting of Q queries, any LLM-MA framework processes these Q queries sequentially to provide solutions one by one. We utilize the initial Q (Q << Q) queries as a training phase, collaboratively optimizing the spatio-temporal communication topology while leveraging multiple agents for reasoning and evaluation. Following this, we perform one-shot pruning as described in Equation (12). The fixed topology Gsub is then employed for the reasoning and evaluation of the remaining (Q Q ) queries.
Hardware Specification No We accessed the GPT models via the Open AI API, and mainly tested on gpt-3.5-turbo-0301 (gpt-3.5) and gpt-4-1106-preview (gpt-4).
Software Dependencies Yes We accessed the GPT models via the Open AI API, and mainly tested on gpt-3.5-turbo-0301 (gpt-3.5) and gpt-4-1106-preview (gpt-4).
Experiment Setup Yes We set the temperature at 1 during the generation. We set the dialogue round K = 2 for mathematical and general reasoning tasks, and K = 4 for code generation tasks. For multi-query settings, we vary Q {5, 10, 20} and fix M = 10. We generate different agent profiles using gpt-4. The pruning ratio is chosen among {50%, 30%}. More experimental details are in Appendix G.2. Initialization of graph masks: The graph masks S = SS, ST are initialized as 0.5 1|V|, and 1 is an all-one matrix.