reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tensorized Attention for Understanding Multi-Object Relationships

Authors: Makoto Nakatsuji, Yasuhiro Fujiwara, Atsushi Otsuka, Narichika Nomoto, Yoshihide Sato

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations show that TAM signiﬁcantly outperforms existing encoder methods, and its integration into the Lo RA adapter for Llama2 enhances ﬁne-tuning accuracy. ... To showcase its effectiveness, we integrated TAM into the Transformer encoder and evaluated its performance in response selection and question-answering tasks. ... When tested with NFL and Politics datasets compiled from Reddit as well as the Tweet QA dataset (Xiong et al. 2019), TAM consistently outperformed existing Transformer-based methods in accuracy. ... In evaluations on Reddit and SQu AD (Rajpurkar et al. 2016) datasets, we observed that TAM enhances accuracy in both response and answer generation tasks.
Researcher Affiliation	Industry	1NTT Human Informatics Laboratories, 2NTT Communication Science Laboratories EMAIL
Pseudocode	No	The paper describes the Multi-object Attention Computation using mathematical equations and refers to an implementation detail using 'torch einsum' but does not present a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statements about releasing code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets	Yes	NFL and Politics Two datasets, NFL and Politics , were compiled by sampling the posts from their respective communities on Reddit between September 2018 and February 2019 for the response selection evluation. The dataset is accessible through Big Query (Henderson et al. 2019). ... Tweet QA We also used the Tweet QA dataset (Xiong et al. 2019). ... SQu AD SQu AD 1.1 dataset is a collection of question-answer pairs derived from Wikipedia articles.
Dataset Splits	Yes	The NFL dataset has 230,060 dialogues in the training set (averaging 4.2 utterances and 56.3 words) and 13,765 dialogues in the test set (averaging 4.2 utterances and 57.6 words). The Politics includes 290,020 dialogues in the training set (averaging 4.8 utterances and 81.1 words) and 19,040 dialogues in the test set (averaging 4.9 utterances and 81.5 words). ... It includes 10,692 training triples and 1,979 test triples. ... It contains 87,599 question-answer-passage triples in the training set and 10,570 in the test set.
Hardware Specification	Yes	The experiments were performed using an NVIDIA A100 GPU with 80GB of memory.
Software Dependencies	No	The paper mentions 'torch einsum' and the 'Adam W optimizer' but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	The embedding size of words, D0, was set to 768 as per (Devlin et al. 2019). The learning rate was set to 1 10 5. We used the Adam W optimizer with beta values of 0.9 and 0.999, respectively, and an epsilon of 1 10 8, following (Loshchilov and Hutter 2019). Sufﬁcient convergence was achieved with 20 (100) epochs for pre-training and 15 (20) epochs for ﬁne-tuning on the NFL and Politics (Tweet QA) datasets. We set the dimension size of Q, K, V, and S to 192 for the NFL dataset, 256 for the Politics dataset, and 160 for the Tweet QA dataset. The batch size was 96 for all methods and datasets.