GraphRouter: A Graph-based Router for LLM Selections

Authors: Tao Feng, Yanzhen Shen, Jiaxuan You

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments across three distinct effect-cost weight scenarios have shown that Graph Router substantially surpasses existing routers, delivering a minimum performance improvement of 12.3%. In addition, it achieves enhanced generalization across new LLMs settings and supports diverse tasks with at least a 9.5% boost in effect and a significant reduction in computational demands.
Researcher Affiliation Academia Tao Feng, Yanzhen Shen, Jiaxuan You Department of Computer Science University of Illinois Urbana Champaign Urbana, IL, USA EMAIL
Pseudocode Yes Algorithm 1 Training of Graph Router
Open Source Code Yes Our codes for Graph Router is released at https://github.com/ulab-uiuc/Graph Router.
Open Datasets Yes Alpaca (Taori et al., 2023) is a hybrid question-answer (QA) dataset containing 52k samples used for fine-tuning the Alpaca model. GSM8K (Cobbe et al., 2021) evaluates the model s ability for multi-step mathematical reasoning with 8.5k linguistically diverse grade school math word problems. SQUAD (Rajpurkar, 2016) is a crowdsourced reading comprehension dataset based on Wiki articles. Multi-News (Fabbri et al., 2019) is a benchmark on multi-document summarization. Human Eval (Chen et al., 2021), is a dataset that measures LLMs coding capabilities Hotpot QA (Yang et al., 2018), a question answering dataset with 113k entries featuring natural, multi-hop questions
Dataset Splits Yes The data is divided into training, validation, and test sets in a ratio of 70% : 10%: 20%, based on different queries.
Hardware Specification Yes all the experiments are conducted on a single NVIDIA A100 Tensor Core GPU.
Software Dependencies No The paper mentions "Py Torch2" and "Py G3" but does not provide specific version numbers for these software components. It only provides links to their general websites.
Experiment Setup Yes In the training stage, we set the graph neural network as a two-layer graph attention network, with a 32-dim hidden dimension. The batch size is 32, and the max training epoch is set to 1000. We use Adam optimizer (Diederik, 2014) for model training and gradually decay the learning rate from 1e-3 to 0 with Lambda LR scheduler.