reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Authors: Zeqing Qin, Yiwei Wu, Lansheng Han

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CLNX on public datasets of 25,872 C/C++ functions with their commits. The results demonstrate that CLNX substantially improves the ability of LLMs to detect C/C++ VCCs. Moreover, CLNX-equipped Code BERT achieves new state-of-the-art performance and identifies 38 OSS vulnerabilities in the real world.
Researcher Affiliation	Academia	Zeqing Qin1,2, Yiwei Wu1, Lansheng Han*1,2,3 1School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 2Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security 3Wuhan Jin Yin Hu Laboratory EMAIL, EMAIL, EMAIL
Pseudocode	No	The Pseudo code of the critical path selecting algorithm is shown in the extended version as Algorithm 1 (Qin, Wu, and Han 2024).
Open Source Code	No	No explicit statement about providing access to the source code for the methodology described in this paper or a link to a code repository is found in the main text.
Open Datasets	Yes	To evaluate our research questions using real-world data, we construct experimental datasets based on the publicly available Devign dataset (Zhou et al. 2019).
Dataset Splits	Yes	Our dataset contains 25,872 pairs of vulnerable and non-vulnerable functions, along with their associated commit IDs, from two major open-source C/C++ projects: FFmpeg and Qemu. The dataset is randomly split into training, validation, and test sets in an 8:1:1 ratio. ... Additionally, the dataset is split into training, testing, and validation sets in an 8:1:1 ratio.
Hardware Specification	Yes	All operations of CLNX, including the code analyzer, critical path selection, and key symbol transformation, are executed on an Intel Xeon(R) Gold 6326 CPU @ 2.90GHz. We perform LLMs fine-tuning on a dedicated machine with an NVIDIA A100 GPU featuring 64GB of memory.
Software Dependencies	Yes	Our implementation of CLNX utilizes Joern v2.0.120 and Scala v3.3.1.
Experiment Setup	Yes	The fine-tuning parameters and the process are in accordance with the defect-detection subject of Code XGLUE (Lu et al. 2021), where the block size is 400, the train batch size is 32, the eval batch size is 64, and the learning rate is 2e-5.