reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Two-phase Multi-document Event Summarization on Core Event Graphs

Authors: Zengjian Chen, Jin Xu, Meng Liao, Tong Xue, Kun He

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For experiments in the new task, we construct two large-scale real-world datasets for training and assessment. Extensive evaluations show that the proposed framework signiﬁcantly outperforms the related baseline methods, with the most dominant event of the articles eﬀectively identiﬁed and correctly summarized. (Abstract) ... 4. Experiments
Researcher Affiliation	Collaboration	Zengjian Chen EMAIL We Chat, Tencent Inc. Huazhong University of Science and Technology Shenzhen, Guangdong, China Jin Xu (corresponding author) EMAIL School of Future Technology, South China University of Technology Guangzhou, Guangdong, China Meng Liao EMAIL Tong Xue EMAIL We Chat, Tencent Inc. Shenzhen, Guangdong, China Kun He (corresponding author) EMAIL Huazhong University of Science and Technology, Wuhan, Hubei, China
Pseudocode	No	The paper describes methodologies through text and mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets	Yes	To facilitate evaluation and further research on MES, we have created two large-scale datasets, one annotated by professional editors, while the other be collected from crawling and search results. (Abstract) ... 2https://drive.google.com/drive/folders/1QX28zDhkh_oHzi_Vy_Ovt_Ym0GvmcPTAdI99
Dataset Splits	Yes	We randomly select 80% of the data as the training data, and use the remaining data for development and test (10% for each).
Hardware Specification	Yes	All models are trained on a single Tesla M40 GPU
Software Dependencies	No	We implement all the mentioned models in Tensorﬂow except Trunc., ILP and Graph-gen. (Section 4.3) ... For the two new Chinese datasets (TMES, SMES), we do word segmentation with the Jieba (Sun, 2012) tool for word counting. (Table 1)
Experiment Setup	Yes	We use a two-layer bi-directional LSTM-RNN encoder and a one-layer uni-directional LSTM-RNN decoder along with the attention mechanism... The vocabulary size is set to 50k... We initialize a 128-dimensional word embedding... optimized with Ada Grad (batch size = 128). The initial learning rate and the accumulator value were set to 0.15 and 0.1, respectively. We use gradient clipping with a maximum gradient norm of 2... For hyper-parameter settings, we tune γ = 0.2 and λ = 0.3 for our model. At the test time, our short event summaries are produced with a decoder whose beam search size is set to 8 and the maximum decoding step size is set to 15.