reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints

Authors: Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations.
Researcher Affiliation	Collaboration	Jiaxin Bai Department of CSE HKUST EMAIL Xin Liu Department of CSE HKUST EMAIL Weiqi Wang Department of CSE HKUST EMAIL Chen Luo Amazon.com Inc EMAIL Yangqiu Song Department of CSE HKUST EMAIL
Pseudocode	Yes	Algorithm 1 The algorithm used for sampling a complex query from a knowledge graph starting from a random vertex v from the knowledge graph G with query structure T.
Open Source Code	Yes	Code and data are publicly available 3. 3https://github.com/HKUST-KnowComp/CEQA
Open Datasets	Yes	To ensure a fair comparison of various methods for the CEQA problem, we generated a dataset by sampling from ASER [53], the largest eventuality knowledge graph, which encompasses fourteen types of discourse relations... The eventuality knowledge graph, ASER-50K, is derived from a sub-sample of ASER2.14. 4https://hkust-knowcomp.github.io/ASER/html/index.html
Dataset Splits	Yes	The division of edges within each knowledge graph into training, validation, and testing sets was performed in an 8:1:1 ratio, as illustrated in Table 5.
Hardware Specification	Yes	All the experiments can be run on NVIDIA RTX3090 GPUs.
Software Dependencies	No	The paper does not explicitly provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to models or frameworks without version details.
Experiment Setup	Yes	We use the same number of embedding sizes of three hundred for all models and use grid-search to tune the hyperparameters of the learning rate ranging from {0.002, 0.001, 0.0005, 0.0002, 0.0001} and batch size ranging from {128, 256, 512}.