EmbedLLM: Learning Compact Representations of Large Language Models

Authors: Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, Kannan Ramchandran

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that Embed LLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model s performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them.
Researcher Affiliation Academia Richard Zhuang Tianhao Wu Zhaojin Wen Andrew Li Jiantao Jiao Kannan Ramchandran University of California, Berkeley
Pseudocode No The paper includes a section titled "4.3 ALGORITHM" which describes the methodology, but it is presented in prose with mathematical equations rather than a structured pseudocode block or code-like format with explicit steps or line numbers.
Open Source Code Yes We open source our dataset, code and embedder to facilitate further research and application: https://github.com/richardzhuang0412/Embed LLM.
Open Datasets Yes We open source our dataset, code and embedder to facilitate further research and application: https://github.com/richardzhuang0412/Embed LLM. We aggregated responses of every model to 36,054 questions from the test sets of MMLU (Hendrycks et al., 2021), Truthful QA (Lin et al., 2022) , Social QA (Sap et al., 2019), PIQA(Bisk et al., 2019), Med MCQA(Pal et al., 2022), Math QA(Amini et al., 2019), Logi QA(Liu et al., 2020), GSM8K(Cobbe et al., 2021), GPQA(Rein et al., 2023), and ASDiv(Miao et al., 2020).
Dataset Splits Yes We performed a random 80%-10%-10% train-validation-test split on the questions and used the sentence transformer all-mpnet-base-v2 (Reimers & Gurevych, 2019) to convert the questions into an initial embedding state of dimension dimq = 768.
Hardware Specification Yes On one NVIDIA A100 80GB GPU, it takes in average 3.80 seconds for Embed LLM router to route 3,000 questions on 50 repeated trials which is basically free compare to the downstream model inference time.
Software Dependencies No The paper mentions "lm-evaluation-harness package (Gao et al., 2023)" and "sentence transformer all-mpnet-base-v2 (Reimers & Gurevych, 2019)" but does not provide specific version numbers for key software components such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions) that would be needed for reproducibility.
Experiment Setup Yes We conduct hyperparameter tuning (number of neighbors for KNN, model embedding dimension for Embed LLM) on a fixed validation set and evaluate prediction accuracy using a fixed test set. Training Embed LLM on a correctness matrix of around 20,000 questions on 112 models for 50 epochs with batch size or 2,048 costs 107.71 TFlops, approximately equivalent to the querying a 7B model for 60 times using an input of length 128.