reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A General Approach to Multimodal Document Quality Assessment

Authors: Aili Shen, Bahar Salehi, Jianzhong Qi, Timothy Baldwin

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our joint model achieves state-of-the-art results over ﬁve datasets in two domains (Wikipedia and academic papers), which demonstrates the complementarity of textual and visual features, and the general applicability of our model. To examine what kinds of features our model has learned, we further train our model in a multi-task learning setting, where document quality assessment is the primary task and feature learning is an auxiliary task. Experimental results show that visual embeddings are better at learning structural features while textual embeddings are better at learning readability scores, which further veriﬁes the complementarity of visual and textual features.
Researcher Affiliation	Academia	Aili Shen EMAIL Bahar Salehi EMAIL Jianzhong Qi EMAIL Timothy Baldwin EMAIL School of Computing and Information Systems The University of Melbourne Victoria 3010, Australia
Pseudocode	No	The paper describes methods like the Inception V3 model and bi-directional LSTM, and outlines their combination in Figure 2, but does not present them in the form of structured pseudocode or algorithm blocks. Descriptions are narrative and supported by diagrams.
Open Source Code	Yes	All code and data are available at https://github.com/Aili Aili/Multi Modal.
Open Datasets	Yes	All code and data are available at https://github.com/Aili Aili/Multi Modal.
Dataset Splits	Yes	We additionally randomly partitioned this dataset into training, development, and test splits based on a ratio of 8:1:1. Details of the dataset are presented in Table 1.
Hardware Specification	No	The paper describes software dependencies and experimental settings but does not specify any particular hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper mentions software tools like NLTK, Image Magick, and Py PDF2, but does not provide specific version numbers for these or any other software dependencies crucial for replication.
Experiment Setup	Yes	We set the LSTM hidden layer size to 256. A dropout layer is applied at both the sentence and document level, respectively, with a probability of 0.5. For bi LSTM, we use a mini-batch size of 128 and a learning rate of 0.001. For both Inception and the Joint model, we use a mini-batch size of 16 and a learning rate of 0.0001. All hyper-parameters were set empirically over the development data, and the models are optimized using the Adam optimizer (Kingma & Ba, 2015). We train each model for 50 epochs. However, to prevent overﬁtting, we adopt early stopping, where we stop training the model if the performance on the development set does not improve for 20 epochs. For Inception, we adopt data augmentation techniques in the training with a nearest ﬁlling mode, a zoom range of 0.1, a width shift range of 0.1, and a height shift range of 0.1.