Towards Practical Defect-Focused Automated Code Review
Authors: Junyi Lu, Lili Jiang, Xiaojia Li, Jianbing Fang, Fengjun Zhang, Li Yang, Chun Zuo
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach, validated on real-world merge requests from historical fault reports, achieves a 2 improvement over standard LLMs and a 10 gain over previous baselines. An ablation study further confirms the contribution of each component... |
| Researcher Affiliation | Collaboration | 1Laboratory of Precise Computing, Institute of Software, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Kuaishou Technology, Beijing, China 4Independent Researcher 5Sinosoft Company Limited, Beijing, China. Correspondence to: Li Yang <EMAIL>. |
| Pseudocode | Yes | The pseudo code of our slicing algorithms is presented in Section G. ... Algorithm 1 Code Slicing ... Algorithm 2 Process AST ... Algorithm 3 Generate New Slice ... Algorithm 4 Get Contiguous Diff Segment ... Algorithm 5 Apply Slicing Algorithm ... Algorithm 6 Original Diff ... Algorithm 7 Parent Function ... Algorithm 8 Left Flow ... Algorithm 9 Full Flow |
| Open Source Code | Yes | Data Availability. We publicly release our codes at https: //zenodo.org/records/14779175. Details regarding their open-source status can be found in Section U. |
| Open Datasets | Yes | To systematically assess the performance of our system, we developed a dataset curated from the company s fault report platform. Each case in this dataset corresponds to an issue that resulted in actual company losses. ... We have released a desensitized JSON folder of fault descriptions in our Zenodo repository.1 |
| Dataset Splits | No | To systematically assess the performance of our system, we developed a dataset curated from the company s fault report platform. Each case in this dataset corresponds to an issue that resulted in actual company losses. ... The dataset consists of 45 real-world fault reports, each corresponding to a significant issue that caused financial losses, along with the associated merge request snapshots. |
| Hardware Specification | Yes | All models and baselines are hosted on a server equipped with an AMD EPYC 7702 CPU and eight Nvidia A100-40G GPUs. |
| Software Dependencies | No | The code slicing component of our framework is implemented using Cppcheck(Marjam aki, 2024), while the LLM engines are integrated through an API supported by the v LLM framework (Kwon et al., 2023), and baselines are integrated via Flask(Organization, 2024). For large models such as LLa MA3.1-405B, we utilize an Int4 version quantized using AWQ (Lin et al., 2024). |
| Experiment Setup | Yes | Our filtering mechanism, integrated within the multi-role system (Section 3.3), operates by answering three key questions for each comment: Q1: Is this comment a nitpick? ... Each question is rated on a scale from 1 to 7, with 1 indicating a nitpick, fake problem, or minimal issue, and 7 indicating a severe and real issue. ... Comments with Q1 or Q2 scores of 4 or below are discarded. |