GraphRouter: A Graph-based Router for LLM Selections
Authors: Tao Feng, Yanzhen Shen, Jiaxuan You
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments across three distinct effect-cost weight scenarios have shown that Graph Router substantially surpasses existing routers, delivering a minimum performance improvement of 12.3%. In addition, it achieves enhanced generalization across new LLMs settings and supports diverse tasks with at least a 9.5% boost in effect and a significant reduction in computational demands. |
| Researcher Affiliation | Academia | Tao Feng, Yanzhen Shen, Jiaxuan You Department of Computer Science University of Illinois Urbana Champaign Urbana, IL, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Training of Graph Router |
| Open Source Code | Yes | Our codes for Graph Router is released at https://github.com/ulab-uiuc/Graph Router. |
| Open Datasets | Yes | Alpaca (Taori et al., 2023) is a hybrid question-answer (QA) dataset containing 52k samples used for fine-tuning the Alpaca model. GSM8K (Cobbe et al., 2021) evaluates the model s ability for multi-step mathematical reasoning with 8.5k linguistically diverse grade school math word problems. SQUAD (Rajpurkar, 2016) is a crowdsourced reading comprehension dataset based on Wiki articles. Multi-News (Fabbri et al., 2019) is a benchmark on multi-document summarization. Human Eval (Chen et al., 2021), is a dataset that measures LLMs coding capabilities Hotpot QA (Yang et al., 2018), a question answering dataset with 113k entries featuring natural, multi-hop questions |
| Dataset Splits | Yes | The data is divided into training, validation, and test sets in a ratio of 70% : 10%: 20%, based on different queries. |
| Hardware Specification | Yes | all the experiments are conducted on a single NVIDIA A100 Tensor Core GPU. |
| Software Dependencies | No | The paper mentions "Py Torch2" and "Py G3" but does not provide specific version numbers for these software components. It only provides links to their general websites. |
| Experiment Setup | Yes | In the training stage, we set the graph neural network as a two-layer graph attention network, with a 32-dim hidden dimension. The batch size is 32, and the max training epoch is set to 1000. We use Adam optimizer (Diederik, 2014) for model training and gradually decay the learning rate from 1e-3 to 0 with Lambda LR scheduler. |