reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Logic Induced High-Order Reasoning Network for Event-Event Relation Extraction

Authors: Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of the proposed method with state-of-the-art performance on benchmark datasets. Experiments Datasets and Metrics We evaluate logic ERE on four widely used datasets. MATRES (Ning, Wu, and Roth 2018) and TCR (Ning et al. 2018) are used to test the performance of TRE. Hi Eve (Glavas et al. 2014) is used for SRE. MAVEN-ERE (Wang et al. 2022) is used to test the joint learning performance. We adopt the standard micro-averaged Precision (P), Recall (R) and F1-scores (F1) as evaluation metrics. All the results are the average of five trials of different random seeds in each experiment. Ablation Study We then conduct ablation study to elucidate the effectiveness of main components of our model.
Researcher Affiliation	Academia	1National Key Laboratory of Information Systems Engineering, National University of Defense Technology, China 2Laboratory for Big Data and Decision, National University of Defense Technology, China 3Information Research Center of Military Science, China EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and training process using mathematical equations and textual descriptions (e.g., Sequence Encoder, Logic Constraint Induced Graph, High-Order Reasoning Network on LCG, Joint Logic Learning), but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include any links to a code repository. It only mentions an extended version of the paper on arXiv.
Open Datasets	Yes	We evaluate logic ERE on four widely used datasets. MATRES (Ning, Wu, and Roth 2018) and TCR (Ning et al. 2018) are used to test the performance of TRE. Hi Eve (Glavas et al. 2014) is used for SRE. MAVEN-ERE (Wang et al. 2022) is used to test the joint learning performance.
Dataset Splits	Yes	For compatible comparison, we utilize the same data splits as in prior work for the considered datasets. We briefly summarize the data statistics for the above datasets in Table 1. Table 1: Data statistics for dataset MATRES, TCR, Hi Eve and MAVEN-ERE (TRE/SRE). Dataset Train Dev Test MATRES Document 260 21 20 Event pairs 10,888 1,852 840 TCR Document 25 Event pairs 2,646 Hi Eve Document 80 20 Event pairs 35,001 7,093 MAVEN-ERE Document 2,913 710 857 (TRE) Event pairs 792,445 188,928 234,844 MAVEN-ERE Document 2,913 710 857 (SRE) Event pairs 9,193 2,826 3,822
Hardware Specification	No	The paper mentions using RoBERTa-base as the document encoder and implementing with Hugging Face Transformers and PyTorch, but it does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	Our implementation uses Hugging Face Transformers (Wolf et al. 2020) and Py Torch (Paszke et al. 2019). While these libraries are mentioned, specific version numbers for them or any other critical software components are not provided.
Experiment Setup	Yes	As for the input of the encoder, we set the dynamic window size to 256, and divide documents into several overlapping windows with a step size 32. We use Adam W (Loshchilov and Hutter 2019) optimizer and learning rate is set to 2e-5. We adopt layer normalization (Ba, Kiros, and Hinton 2016) and dropout (Srivastava et al. 2014) between the high-order reasoning network layers. We perform early stopping and tune the hyper-parameters by grid search on the development set: heads C {1, 2, 4, 8}, dropout rate {0.1, 0.2, 0.3} and loss coefficients γsym, γconj {0.1, 0.2, 0.4, 0.6}.