Suitable is the Best: Task-Oriented Knowledge Fusion in Vulnerability Detection
Authors: Jingjing Wang, Minhuan Huang, yuanping nie, Xiang Li, Qianjin Du, Wei Kong, Huan Deng, Xiaohui Kuang
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that KF-GVD outperforms SOTAs on function-level and statement-level vulnerability detection across various target tasks, with an average increase of 40.9% in precision and 26.1% in recall. |
| Researcher Affiliation | Academia | Jingjing Wang Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL Minhuan Huang Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL Yuanpin Nie Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL Xiang Li Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL Qianjin Du Department of Computer Science and Technology, Tsinghua University EMAIL Wei Kong School of Information Science and Engineering, Zhejiang Sci-Tech University EMAIL Huan Deng Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL Xiaohui Kuang Institute of Systems Engineering, Academy of Military Sciences, PLA EMAIL |
| Pseudocode | No | The paper describes the method and model architecture in prose and figures (e.g., Figure 3, Figure 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The dataset has been uploaded to the supplementary materials, and the detail can be found in Appendix C. |
| Open Datasets | Yes | The source task dataset consists of 80% CWE-119 and CWE-416 type vulnerability information extensively collected from 13 real-world C++ projects from NVD3. The remaining 20% is sourced from academic security defects and synthetic data provided by SARD4. |
| Dataset Splits | Yes | Train:Validation:Test 8:1:1 |
| Hardware Specification | Yes | We conducted all experiments on a workstation equipped with a Quadro RTX 6000 GPU. |
| Software Dependencies | Yes | CPGs corresponding to source code files were generated using Joern version 1.1.1033. We employed a pre-trained Word2Vec model... The SAGPool model deployed in both source and target tasks were implemented using Py Torch version 1.4.0 and CUDA version 10.2. |
| Experiment Setup | Yes | Model Parameter Setting Min count 0.001 Size 30 Window 5 Embedding dim 300 Hidden dim 32 Activation funcion Relu Learning rate 0.0001 Optimizer Adam Train:Validation:Test 8:1:1 |