reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Uncertainty-Aware Contrastive Learning with Hard Negative Sampling for Code Search Tasks

Authors: Han Liu, Jiaqing Zhan, Qin Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results indicate that our approach outperforms 10 baseline methods on a large code search dataset with six programming languages. The results also show that our strategies of uncertainty learning and hard negative sampling can really help enhance the representation of queries and codes leading to an improvement of the code search performance.
Researcher Affiliation	Academia	Han Liu1 2, Jiaqing Zhan1, Qin Zhang1* 1College of Computer Science and Software Engineering, Shenzhen University 2Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed approach and loss functions using mathematical equations and text, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We conducted experiments on the Code Search Net code corpus (Husain et al. 2019), to be consistent with Guo et al.. Code Search Net contains six languages, namely, Ruby, Java Script, Go, Python, Java, and PHP, and has been widely used in previous studies.
Dataset Splits	Yes	To make the experimental setup closer to the real scenarios, Guo et al. expanded the candidate dataset and filtered out low-quality queries on each code corpus through rules, where the data statistics are shown in Table 1. Table 1: Dataset statistics. Language Training Dev Test Candidates size Python 251,820 13,914 14,918 43,827 PHP 241,241 12,982 14,014 52,660 Go 167,288 7,325 8,122 28,120 Java 164,923 5,183 10,955 40,347 Java Script 58,025 3,885 3,291 13,981 Ruby 24,927 1,400 1,261 4,360
Hardware Specification	Yes	All experiments were conducted on a machine equipped with four NVIDIA Ge Force RTX 4090 GPUs which each has 24GB of memory.
Software Dependencies	No	The paper mentions using a 'Transformer architecture' and initializing with 'Co Co So Da (Shi et al. 2023)' parameters, and using 'Adam W optimizer'. However, it does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	For training, we set the batch size to 128, the temperature hyperparameter to 0.03, the number of epochs to 10, and the random seed to 123456. The maximum sequence lengths are set to 256 for code snippets and 128 for queries. We use the Adam W optimizer with a learning rate of 8e-6.