G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
Authors: Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Tianlong Chen, Dawei Cheng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on six benchmarks showcase that G-Designer is: (1) high-performing, achieving superior results on MMLU with accuracy at 84.50% and on Human Eval with pass@1 at 89.90%; (2) taskadaptive, architecting communication protocols tailored to task difficulty, reducing token consumption by up to 95.33% on Human Eval; and (3) adversarially robust, defending against agent adversarial attacks with merely 0.3% accuracy drop. |
| Researcher Affiliation | Academia | 1Tongji University 2NUS 3CUHK 4UCLA 5USTC 6NTU 7UNC-Chapel Hill. Correspondence to: Kun Wang <EMAIL>, Dawei Cheng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Designing workflow of G-Designer |
| Open Source Code | Yes | The code is available at https://github. com/yanweiyue/GDesigner. |
| Open Datasets | Yes | We evaluate G-Designer on three categories of datasets: General Reasoning: MMLU (Hendrycks et al., 2021); Mathematical Reasoning: GSM8K (Cobbe et al., 2021), Multi Arith (Roy & Roth, 2016), SVAMP (Patel et al., 2021), and AQu A (Ling et al., 2017); Code: Human Eval (Chen et al., 2021). We include the dataset statistics in Table 4. |
| Dataset Splits | Yes | Given a benchmark {Qi}D i=1 consisting of B queries, G-Designer begins by optimizing with a small subset of B queries and fixes the learned parameters for testing on the remaining (B B ) queries. [...] For all benchmarks, we merely use B {40, 80} queries for optimization. |
| Hardware Specification | No | The paper mentions 'GPU cost' in Table 5 but does not provide specific details about the type of GPU, CPU, or other hardware used for running their experiments. It only refers to accessing 'GPT via the Open AI API' which is an external service. |
| Software Dependencies | Yes | We access the GPT via the Open AI API, and mainly test on gpt-4-1106-preview (gpt-4) and gpt-3.5-turbo-0125 (gpt-3.5). We set temperature to 0 for the single execution and single agent baselines and 1 for multi-agent methods. We set a summarizer agent to aggregate the dialogue history and produce the final solution a(K), with K = 3 across all experiments. The Node Encoder( ) is implemented using all-Mini LM-L6-v2 (Wang et al., 2020), with the embedding dimension set to D = 384. |
| Experiment Setup | Yes | We set temperature to 0 for the single execution and single agent baselines and 1 for multi-agent methods. We set a summarizer agent to aggregate the dialogue history and produce the final solution a(K), with K = 3 across all experiments. The Node Encoder( ) is implemented using all-Mini LM-L6-v2 (Wang et al., 2020), with the embedding dimension set to D = 384. The anchor topology Aanchor is predefined as a simple chain structure. The sampling times M are set as 10, and τ = 1e 2 and ζ = 1e 1 are set for all experiments. We provide explicit agent profiling for multi-agent methods, following the classical configurations in LLM-MA systems (Liu et al., 2023; Zhuge et al., 2024; Yin et al., 2023), and use gpt-4 to generate agent profile pools. For all benchmarks, we merely use B {40, 80} queries for optimization. |