reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving

Authors: Changhai Zhou, Yuhua Zhou, Shiyang Zhang, Yibin Wang, Zekai Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that Dop can improve throughput by 1.30-1.46 times in a SOTA multi-tenant Lo RA serving. ... The experiments were conducted with batch sizes ranging from 1 to 64. Each configuration was tested 1,000 times, and the average latency was recorded. ... Our results consistently show that Dop outperforms the existing solutions in both the Lo RA operator microbenchmark and text generation throughput.
Researcher Affiliation	Academia	1 School of Computer Science, Fudan University 2 College of Computer Science and Technology, Zhejiang University 3 Columbia University 4 Zhejiang Lab EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the Dynamic Operator Optimization (Dop) method and its components (Search Space Constructor, Optimization Engine) in detail, but it does so using descriptive text and flowcharts (Figure 2), not formal pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using 'Hugging Face Transformers (Wolf et al. 2020) library' and 'Hugging Face PEFT library (Mangrulkar et al. 2022)' for Llama-2 models but does not provide any statement or link for the release of the authors' own implementation code for the methodology described.
Open Datasets	Yes	In this study, we evaluate our method using the Llama-2 models (Touvron et al. 2023) with 7B and 13B parameters.
Dataset Splits	No	The paper discusses 'workload types' such as Distinct, Uniform, Skewed, and Identical request distributions, and mentions testing with '1,000 requests' for throughput evaluation. However, it does not provide specific details on training/test/validation dataset splits, as its focus is on model serving performance rather than training.
Hardware Specification	Yes	The hardware used includes NVIDIA A100 40GB and NVIDIA RTX 3090 GPUs.
Software Dependencies	Yes	All experiments are conducted on Ubuntu with Py Torch 2.1.2 and CUDA 12.4. The Llama-2 models are implemented using the Hugging Face Transformers (Wolf et al. 2020) library, with Lo RA weights integrated via the Hugging Face PEFT library (Mangrulkar et al. 2022).
Experiment Setup	Yes	For each scenario, Dop was executed with 300 mutation iterations, with the entire Dop execution taking approximately 1.5 hours. ... The maximum batch size was set to 32, and all systems processed requests in a first-come, first-served manner. ... The experiments were conducted with batch sizes ranging from 1 to 64.