Does Entity Abstraction Help Generative Transformers Reason?

Authors: Nicolas Gontier, Siva Reddy, Christopher Pal

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requiring different forms of logical reasoning... Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and Proof Writer respectively.
Researcher Affiliation Collaboration Nicolas Gontier EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Polytechnique Montreal, Canada Service Now Research Siva Reddy EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Mc Gill University, Montreal, Canada Facebook CIFAR AI Chair Service Now Research Christopher Pal EMAIL Quebec Artificial Intelligence Institute (Mila), Montreal, Canada Polytechnique Montreal, Canada Canada CIFAR AI Chair Service Now Research
Pseudocode No The paper describes methods in Sections 3.1, 3.2, and 3.3 and illustrates architectures with figures (Figure 1a-e). It does not, however, include any sections or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code No The paper mentions using "the Allen NLP library (Gardner et al., 2017) with the Hugging Face transformers library (Wolf et al., 2019) Py Torch implementation of T5-small" but does not provide a specific link or explicit statement about releasing the source code for their own methodology or implementation.
Open Datasets Yes We study the utility of incorporating entity type abstractions... and test these methods on four NLP tasks...: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (Proof Writer), (3) multi-hop question answering (Hotpot QA), and (4) conversational question answering (Co QA). CLUTRR (Sinha et al., 2019) Proof Writer (Clark et al., 2020; Tafjord et al., 2021) Hotpot QA (Yang et al., 2018) Co QA (Reddy et al., 2019) We used the real abstraction labels provided by the grammar files2, and defined the following abstraction tokens: PERSON , ATTRIBUTE , ANIMAL , RELATION . 2https://tinyurl.com/proofwritergrammars
Dataset Splits Yes We generated 390, 000 examples that were roughly split 77/23 between training and testing. Each example consists of a unique (non-cyclic) family graph... We fine-tuned a T5-small model on 300, 000 training examples of levels 2, 4 and 6 and evaluate the model on 9 test sets of 10, 000 examples each for all levels from 2 to 10. We fine-tuned T5-small models on the official training and development set from the depth <= 2 data folder and tested it on the test set from the depth <= 5 data folder; consisting of 70,076 training examples and 20,030 testing examples. used the official validation set as our test set... fine-tuned a T5-small model on 90% of the training set while keeping the remaining 10% as our custom validation set for early stopping.
Hardware Specification Yes Each experiment was run on tesla V100 32gb GPUs with early stopping and a patience of 10 epochs on the validation set (defined as a 10% split of the training set).
Software Dependencies Yes We used the Allen NLP library (Gardner et al., 2017) with the Hugging Face transformers library (Wolf et al., 2019) Py Torch implementation of T5-small with 16-bit floating point precision... We report all hyper-parameters and library versions in Appendix A for reproducibility purposes. Table 7: Library version and model hyper-parameters. Allen NLP version 2.2.0 Transformers version 4.4.2 Spacy version 2.3.5
Experiment Setup Yes We report all hyper-parameters and library versions in Appendix A for reproducibility purposes. Table 7: Library version and model hyper-parameters. batch size 256 16-bit floating point True dim embedding 512 dim feedforward 2048 dim key-value 64 dropout 0.1 max length 512 #of heads 8 #of layers 6 optimizer Adam W learning rate 1.00E-05 betas [0.9, 0.999] epsilon 1.00E-08 gradient norm 1.0 sampler top-p p 0.9 temperature 1.0