Cost-efficient Collaboration between On-device and Cloud Language Models
Authors: Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | MINIONS reduces costs by 5.7 on average while recovering 97.9% of the remote-only performance. Our analysis reveals several key design choices that influence the tradeoff between cost and performance in local-remote systems. We evaluate MINIONS on three benchmarks that are well suited for data-intensive reasoning: FINANCEBENCH, LONGHEALTH, and QASPER. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Stanford University 2Department of Statistics, Stanford University 3Together AI 4Department of Biomedical Data Science, Stanford University. Correspondence to: Sabri Eyuboglu <EMAIL>. |
| Pseudocode | Yes | def prepare_jobs(context: List[str], prev_job_manifests: Optional[List[Job Manifest]] = None, prev_job_outputs: Optional[List[Job Output]] = None) -> List[Job Manifest]: |
| Open Source Code | No | No explicit statement about code release or a link to a repository is provided in the paper. |
| Open Datasets | Yes | We evaluate MINIONS on three benchmarks that are well suited for data-intensive reasoning: FINANCEBENCH (Islam et al., 2023), LONGHEALTH (Adams et al., 2024), and QASPER (Dasigi et al., 2021). |
| Dataset Splits | Yes | For all ablations in Section 6, we use a fixed subset of 128 problems. We train on 317 questions and test on 17 held-out questions. |
| Hardware Specification | Yes | For these experiments, the Local LM is running on a single consumer-grade GPU (e.g. RTX 4090, MSRP $1,599). We run our local models on A100 GPUs. |
| Software Dependencies | No | The paper mentions models like GPT-4O, LLAMA, and QWEN2.5, and tools like Ollama and llama.cpp, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All local-only and remote-only experiments are run with temperature of 0.2. For all MINIONS experiments run in Table 1, we run the Remote LM with a temperature of 0.0 and Local LM with a temperature of 0.2 for FINANCEBENCH and 0.00001 for QASPER and LONGHEALTH. |