Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints

Authors: Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations.
Researcher Affiliation Collaboration Jiaxin Bai Department of CSE HKUST EMAIL Xin Liu Department of CSE HKUST EMAIL Weiqi Wang Department of CSE HKUST EMAIL Chen Luo Amazon.com Inc EMAIL Yangqiu Song Department of CSE HKUST EMAIL
Pseudocode Yes Algorithm 1 The algorithm used for sampling a complex query from a knowledge graph starting from a random vertex v from the knowledge graph G with query structure T.
Open Source Code Yes Code and data are publicly available 3. 3https://github.com/HKUST-KnowComp/CEQA
Open Datasets Yes To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations... The eventuality knowledge graph, ASER-50K, is derived from a sub-sample of ASER2.14. 4https://hkust-knowcomp.github.io/ASER/html/index.html
Dataset Splits Yes The division of edges within each knowledge graph into training, validation, and testing sets was performed in an 8:1:1 ratio, as illustrated in Table 5.
Hardware Specification Yes All the experiments can be run on NVIDIA RTX3090 GPUs.
Software Dependencies No The paper does not explicitly provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to models or frameworks without version details.
Experiment Setup Yes We use the same number of embedding sizes of three hundred for all models and use grid-search to tune the hyperparameters of the learning rate ranging from {0.002, 0.001, 0.0005, 0.0002, 0.0001} and batch size ranging from {128, 256, 512}.