reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Does Entity Abstraction Help Generative Transformers Reason?

Authors: Nicolas Gontier, Siva Reddy, Christopher Pal

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requiring different forms of logical reasoning... Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and Proof Writer respectively.
Researcher Affiliation	Collaboration	Nicolas Gontier EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Polytechnique Montreal, Canada Service Now Research Siva Reddy EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Mc Gill University, Montreal, Canada Facebook CIFAR AI Chair Service Now Research Christopher Pal EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Polytechnique Montreal, Canada Canada CIFAR AI Chair Service Now Research
Pseudocode	No	The paper describes methods in Sections 3.1, 3.2, and 3.3 and illustrates architectures with figures (Figure 1a-e). It does not, however, include any sections or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code	No	The paper mentions using "the Allen NLP library (Gardner et al., 2017) with the Hugging Face transformers library (Wolf et al., 2019) Py Torch implementation of T5-small" but does not provide a specific link or explicit statement about releasing the source code for their own methodology or implementation.
Open Datasets	Yes	We study the utility of incorporating entity type abstractions... and test these methods on four NLP tasks...: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (Proof Writer), (3) multi-hop question answering (Hotpot QA), and (4) conversational question answering (Co QA). CLUTRR (Sinha et al., 2019) Proof Writer (Clark et al., 2020; Tafjord et al., 2021) Hotpot QA (Yang et al., 2018) Co QA (Reddy et al., 2019) We used the real abstraction labels provided by the grammar files2, and defined the following abstraction tokens: PERSON , ATTRIBUTE , ANIMAL , RELATION . 2https://tinyurl.com/proofwritergrammars
Dataset Splits	Yes	We generated 390, 000 examples that were roughly split 77/23 between training and testing. Each example consists of a unique (non-cyclic) family graph... We fine-tuned a T5-small model on 300, 000 training examples of levels 2, 4 and 6 and evaluate the model on 9 test sets of 10, 000 examples each for all levels from 2 to 10. We fine-tuned T5-small models on the official training and development set from the depth <= 2 data folder and tested it on the test set from the depth <= 5 data folder; consisting of 70,076 training examples and 20,030 testing examples. used the official validation set as our test set... fine-tuned a T5-small model on 90% of the training set while keeping the remaining 10% as our custom validation set for early stopping.
Hardware Specification	Yes	Each experiment was run on tesla V100 32gb GPUs with early stopping and a patience of 10 epochs on the validation set (defined as a 10% split of the training set).
Software Dependencies	Yes	We used the Allen NLP library (Gardner et al., 2017) with the Hugging Face transformers library (Wolf et al., 2019) Py Torch implementation of T5-small with 16-bit floating point precision... We report all hyper-parameters and library versions in Appendix A for reproducibility purposes. Table 7: Library version and model hyper-parameters. Allen NLP version 2.2.0 Transformers version 4.4.2 Spacy version 2.3.5
Experiment Setup	Yes	We report all hyper-parameters and library versions in Appendix A for reproducibility purposes. Table 7: Library version and model hyper-parameters. batch size 256 16-bit floating point True dim embedding 512 dim feedforward 2048 dim key-value 64 dropout 0.1 max length 512 #of heads 8 #of layers 6 optimizer Adam W learning rate 1.00E-05 betas [0.9, 0.999] epsilon 1.00E-08 gradient norm 1.0 sampler top-p p 0.9 temperature 1.0