reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Entity Alignment with Noisy Annotations from Large Language Models

Authors: Shengyuan Chen, Qinggang Zhang, Junnan Dong, Wen Hua, Qing Li, Xiao Huang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency.
Researcher Affiliation	Academia	Shengyuan Chen Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Qinggang Zhang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Junnan Dong Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Wen Hua Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Qing Li Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Xiao Huang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL
Pseudocode	Yes	Algorithm 1 The greedy label refinement algorithm
Open Source Code	Yes	We have provided the code for the framework, accessible via this URL: https://github.com/chensyCN/llm4ea_official.
Open Datasets	Yes	In this study, we use the widely-adopted Open EA dataset (Sun et al., 2020), including two monolingual datasets (D-W-15K and D-Y-15K) and two cross-lingual datasets (ENDE-15K and EN-FR-15K). Open EA comes in two versions: "V1" the normal version, and "V2" the dense version. We employ "V2" in the experiments in the main text.
Dataset Splits	No	The paper mentions training, but does not explicitly state training, validation, and test splits with percentages or counts. It refers to standard datasets but not their specific splits.
Hardware Specification	Yes	Our experiments were conducted on a server equipped with six NVIDIA Ge Force RTX 3090 GPUs, 48 Intel(R) Xeon(R) Silver 4214R CPUs, and 376GB of host memory.
Software Dependencies	Yes	The details of the software packages used in our experiments are listed in Table 4. Table 4: Package configurations of our experiments. Package tqdm numpy scipy tensorflow keras openai Version 4.66.2 1.24.4 1.10.1 2.7.0 2.7.0 1.30.1
Experiment Setup	Yes	Setup of LLM4EA. We employ GPT-3.5 as the default LLM due to its cost efficiency. Other parameters are n = 3, nlr, k = 20, δ0 = 0.5, δ1 = 0.9.