Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints
Authors: Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations. |
| Researcher Affiliation | Collaboration | Jiaxin Bai Department of CSE HKUST EMAIL Xin Liu Department of CSE HKUST EMAIL Weiqi Wang Department of CSE HKUST EMAIL Chen Luo Amazon.com Inc EMAIL Yangqiu Song Department of CSE HKUST EMAIL |
| Pseudocode | Yes | Algorithm 1 The algorithm used for sampling a complex query from a knowledge graph starting from a random vertex v from the knowledge graph G with query structure T. |
| Open Source Code | Yes | Code and data are publicly available 3. 3https://github.com/HKUST-KnowComp/CEQA |
| Open Datasets | Yes | To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations... The eventuality knowledge graph, ASER-50K, is derived from a sub-sample of ASER2.14. 4https://hkust-knowcomp.github.io/ASER/html/index.html |
| Dataset Splits | Yes | The division of edges within each knowledge graph into training, validation, and testing sets was performed in an 8:1:1 ratio, as illustrated in Table 5. |
| Hardware Specification | Yes | All the experiments can be run on NVIDIA RTX3090 GPUs. |
| Software Dependencies | No | The paper does not explicitly provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to models or frameworks without version details. |
| Experiment Setup | Yes | We use the same number of embedding sizes of three hundred for all models and use grid-search to tune the hyperparameters of the learning rate ranging from {0.002, 0.001, 0.0005, 0.0002, 0.0001} and batch size ranging from {128, 256, 512}. |