PerfSeer: An Efficient and Accurate Deep Learning Models Performance Predictor
Authors: Xinlong Zhao, Jiande Sun, Jia Zhang, Tong Liu, Ke Liu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We constructed a dataset containing performance metrics for 53k+ model configurations, including execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference. The evaluation results show that Perf Seer outperforms nn-Meter, Brp-NAS, and DIPPM. |
| Researcher Affiliation | Collaboration | Xinlong Zhao1 , Jiande Sun1 , Jia Zhang1 , Tong Liu2 and Ke Liu1 1Shandong Normal University 2IEIT SYSTEMS Co., Ltd. |
| Pseudocode | No | The paper describes the workflow and update functions of the Seer Block using equations and textual descriptions (e.g., "e j = ϕe ej, vsj, vtj", "v i = ϕv ( e i, vi, z, u)") and an architectural diagram (Figure 2). However, it does not present these as a structured pseudocode or algorithm block with typical formatting elements like loops, conditional statements, or explicit labels like "Algorithm 1". |
| Open Source Code | Yes | We construct a performance dataset1. 1https://github.com/upuuuuuu/Perf Seer |
| Open Datasets | Yes | We constructed a dataset with over 53k model configurations, covering key performance metrics such as execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference in Nvidia Ge Force RTX 3090. [...] We construct a performance dataset1. 1https://github.com/upuuuuuu/Perf Seer |
| Dataset Splits | Yes | The dataset is divided into 2:1:1 for training, validation, and testing. |
| Hardware Specification | Yes | We constructed a dataset with over 53k model configurations, covering key performance metrics such as execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference in Nvidia Ge Force RTX 3090. [...] We evaluated the overhead of Perf Seer on an Intel i7-11700 CPU |
| Software Dependencies | No | Perf Seer is compatible with multiple DL frameworks, such as Py Torch, Tensor Flow, and MXNet, unlike other predictors that support only a few. [...] We use a batch size of 128 and an initial learning rate of 1e-3, halving it after five epochs without improvement, down to 1e-6. Training runs for up to 500 epochs, with Mean Squared Error (MSE) as the loss function and Adam as the optimizer. Although deep learning frameworks and an optimizer are mentioned, no specific version numbers for these software components are provided to ensure reproducibility. |
| Experiment Setup | Yes | We use a batch size of 128 and an initial learning rate of 1e-3, halving it after five epochs without improvement, down to 1e-6. Training runs for up to 500 epochs, with Mean Squared Error (MSE) as the loss function and Adam as the optimizer. |