Entity Alignment with Noisy Annotations from Large Language Models
Authors: Shengyuan Chen, Qinggang Zhang, Junnan Dong, Wen Hua, Qing Li, Xiao Huang
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency. |
| Researcher Affiliation | Academia | Shengyuan Chen Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Qinggang Zhang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Junnan Dong Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Wen Hua Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Qing Li Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL Xiao Huang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR EMAIL |
| Pseudocode | Yes | Algorithm 1 The greedy label refinement algorithm |
| Open Source Code | Yes | We have provided the code for the framework, accessible via this URL: https://github.com/chensyCN/llm4ea_official. |
| Open Datasets | Yes | In this study, we use the widely-adopted Open EA dataset (Sun et al., 2020), including two monolingual datasets (D-W-15K and D-Y-15K) and two cross-lingual datasets (ENDE-15K and EN-FR-15K). Open EA comes in two versions: "V1" the normal version, and "V2" the dense version. We employ "V2" in the experiments in the main text. |
| Dataset Splits | No | The paper mentions training, but does not explicitly state training, validation, and test splits with percentages or counts. It refers to standard datasets but not their specific splits. |
| Hardware Specification | Yes | Our experiments were conducted on a server equipped with six NVIDIA Ge Force RTX 3090 GPUs, 48 Intel(R) Xeon(R) Silver 4214R CPUs, and 376GB of host memory. |
| Software Dependencies | Yes | The details of the software packages used in our experiments are listed in Table 4. Table 4: Package configurations of our experiments. Package tqdm numpy scipy tensorflow keras openai Version 4.66.2 1.24.4 1.10.1 2.7.0 2.7.0 1.30.1 |
| Experiment Setup | Yes | Setup of LLM4EA. We employ GPT-3.5 as the default LLM due to its cost efficiency. Other parameters are n = 3, nlr, k = 20, δ0 = 0.5, δ1 = 0.9. |