PerfSeer: An Efficient and Accurate Deep Learning Models Performance Predictor

Authors: Xinlong Zhao, Jiande Sun, Jia Zhang, Tong Liu, Ke Liu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We constructed a dataset containing performance metrics for 53k+ model configurations, including execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference. The evaluation results show that Perf Seer outperforms nn-Meter, Brp-NAS, and DIPPM.
Researcher Affiliation Collaboration Xinlong Zhao1 , Jiande Sun1 , Jia Zhang1 , Tong Liu2 and Ke Liu1 1Shandong Normal University 2IEIT SYSTEMS Co., Ltd.
Pseudocode No The paper describes the workflow and update functions of the Seer Block using equations and textual descriptions (e.g., "e j = ϕe ej, vsj, vtj", "v i = ϕv ( e i, vi, z, u)") and an architectural diagram (Figure 2). However, it does not present these as a structured pseudocode or algorithm block with typical formatting elements like loops, conditional statements, or explicit labels like "Algorithm 1".
Open Source Code Yes We construct a performance dataset1. 1https://github.com/upuuuuuu/Perf Seer
Open Datasets Yes We constructed a dataset with over 53k model configurations, covering key performance metrics such as execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference in Nvidia Ge Force RTX 3090. [...] We construct a performance dataset1. 1https://github.com/upuuuuuu/Perf Seer
Dataset Splits Yes The dataset is divided into 2:1:1 for training, validation, and testing.
Hardware Specification Yes We constructed a dataset with over 53k model configurations, covering key performance metrics such as execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference in Nvidia Ge Force RTX 3090. [...] We evaluated the overhead of Perf Seer on an Intel i7-11700 CPU
Software Dependencies No Perf Seer is compatible with multiple DL frameworks, such as Py Torch, Tensor Flow, and MXNet, unlike other predictors that support only a few. [...] We use a batch size of 128 and an initial learning rate of 1e-3, halving it after five epochs without improvement, down to 1e-6. Training runs for up to 500 epochs, with Mean Squared Error (MSE) as the loss function and Adam as the optimizer. Although deep learning frameworks and an optimizer are mentioned, no specific version numbers for these software components are provided to ensure reproducibility.
Experiment Setup Yes We use a batch size of 128 and an initial learning rate of 1e-3, halving it after five epochs without improvement, down to 1e-6. Training runs for up to 500 epochs, with Mean Squared Error (MSE) as the loss function and Adam as the optimizer.