EmbedLLM: Learning Compact Representations of Large Language Models
Authors: Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, Kannan Ramchandran
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that Embed LLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model s performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them. |
| Researcher Affiliation | Academia | Richard Zhuang Tianhao Wu Zhaojin Wen Andrew Li Jiantao Jiao Kannan Ramchandran University of California, Berkeley |
| Pseudocode | No | The paper includes a section titled "4.3 ALGORITHM" which describes the methodology, but it is presented in prose with mathematical equations rather than a structured pseudocode block or code-like format with explicit steps or line numbers. |
| Open Source Code | Yes | We open source our dataset, code and embedder to facilitate further research and application: https://github.com/richardzhuang0412/Embed LLM. |
| Open Datasets | Yes | We open source our dataset, code and embedder to facilitate further research and application: https://github.com/richardzhuang0412/Embed LLM. We aggregated responses of every model to 36,054 questions from the test sets of MMLU (Hendrycks et al., 2021), Truthful QA (Lin et al., 2022) , Social QA (Sap et al., 2019), PIQA(Bisk et al., 2019), Med MCQA(Pal et al., 2022), Math QA(Amini et al., 2019), Logi QA(Liu et al., 2020), GSM8K(Cobbe et al., 2021), GPQA(Rein et al., 2023), and ASDiv(Miao et al., 2020). |
| Dataset Splits | Yes | We performed a random 80%-10%-10% train-validation-test split on the questions and used the sentence transformer all-mpnet-base-v2 (Reimers & Gurevych, 2019) to convert the questions into an initial embedding state of dimension dimq = 768. |
| Hardware Specification | Yes | On one NVIDIA A100 80GB GPU, it takes in average 3.80 seconds for Embed LLM router to route 3,000 questions on 50 repeated trials which is basically free compare to the downstream model inference time. |
| Software Dependencies | No | The paper mentions "lm-evaluation-harness package (Gao et al., 2023)" and "sentence transformer all-mpnet-base-v2 (Reimers & Gurevych, 2019)" but does not provide specific version numbers for key software components such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | We conduct hyperparameter tuning (number of neighbors for KNN, model embedding dimension for Embed LLM) on a fixed validation set and evaluate prediction accuracy using a fixed test set. Training Embed LLM on a correctness matrix of around 20,000 questions on 112 models for 50 epochs with batch size or 2,048 costs 107.71 TFlops, approximately equivalent to the querying a 7B model for 60 times using an input of length 128. |