Tensorized Attention for Understanding Multi-Object Relationships
Authors: Makoto Nakatsuji, Yasuhiro Fujiwara, Atsushi Otsuka, Narichika Nomoto, Yoshihide Sato
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations show that TAM significantly outperforms existing encoder methods, and its integration into the Lo RA adapter for Llama2 enhances fine-tuning accuracy. ... To showcase its effectiveness, we integrated TAM into the Transformer encoder and evaluated its performance in response selection and question-answering tasks. ... When tested with NFL and Politics datasets compiled from Reddit as well as the Tweet QA dataset (Xiong et al. 2019), TAM consistently outperformed existing Transformer-based methods in accuracy. ... In evaluations on Reddit and SQu AD (Rajpurkar et al. 2016) datasets, we observed that TAM enhances accuracy in both response and answer generation tasks. |
| Researcher Affiliation | Industry | 1NTT Human Informatics Laboratories, 2NTT Communication Science Laboratories EMAIL |
| Pseudocode | No | The paper describes the Multi-object Attention Computation using mathematical equations and refers to an implementation detail using 'torch einsum' but does not present a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | NFL and Politics Two datasets, NFL and Politics , were compiled by sampling the posts from their respective communities on Reddit between September 2018 and February 2019 for the response selection evluation. The dataset is accessible through Big Query (Henderson et al. 2019). ... Tweet QA We also used the Tweet QA dataset (Xiong et al. 2019). ... SQu AD SQu AD 1.1 dataset is a collection of question-answer pairs derived from Wikipedia articles. |
| Dataset Splits | Yes | The NFL dataset has 230,060 dialogues in the training set (averaging 4.2 utterances and 56.3 words) and 13,765 dialogues in the test set (averaging 4.2 utterances and 57.6 words). The Politics includes 290,020 dialogues in the training set (averaging 4.8 utterances and 81.1 words) and 19,040 dialogues in the test set (averaging 4.9 utterances and 81.5 words). ... It includes 10,692 training triples and 1,979 test triples. ... It contains 87,599 question-answer-passage triples in the training set and 10,570 in the test set. |
| Hardware Specification | Yes | The experiments were performed using an NVIDIA A100 GPU with 80GB of memory. |
| Software Dependencies | No | The paper mentions 'torch einsum' and the 'Adam W optimizer' but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | The embedding size of words, D0, was set to 768 as per (Devlin et al. 2019). The learning rate was set to 1 10 5. We used the Adam W optimizer with beta values of 0.9 and 0.999, respectively, and an epsilon of 1 10 8, following (Loshchilov and Hutter 2019). Sufficient convergence was achieved with 20 (100) epochs for pre-training and 15 (20) epochs for fine-tuning on the NFL and Politics (Tweet QA) datasets. We set the dimension size of Q, K, V, and S to 192 for the NFL dataset, 256 for the Politics dataset, and 160 for the Tweet QA dataset. The batch size was 96 for all methods and datasets. |