reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SimRP: Syntactic and Semantic Similarity Retrieval Prompting Enhances Aspect Sentiment Quad Prediction

Authors: Zhongquan Jian, Yanhao Chen, Jiajian Li, Shaopan Wang, Xiangjian Zeng, Junfeng Yao, Xinying An, Qingqiang Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in Supervised Fine-Tuning (SFT) and Incontext Learning (ICL) paradigms demonstrate the effectiveness of Sim RP. Furthermore, we find that LLMs capabilities in ASQP are severely underestimated by biased data annotations and the exact matching metric. We propose a novel constituent subtree-based fuzzy metric for more accurate and rational quadruple recognition. Main Results We execute experiments three times with fixed seeds of [2024, 2025, 2026], and report the mean values in Table 2. The best results are marked in bold and the second-best underlined. Overall, our approach greatly outperforms existing methods, with an average F1 score improvement of 2.20%.
Researcher Affiliation	Academia	1Institute of Artificial Intelligence, Xiamen University, Xiamen, China 2School of Informatics, Xiamen University, Xiamen, China 3School of Film, Xiamen University, Xiamen, China 4School of Journalism and Communication, Xiamen University, Xiamen, China 5Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China 6Xiamen Key Laboratory of Intelligent Storage and Computing, School of Informatics, Xiamen University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Syn2Vec 1: Initial vocabulary V = . initialize the vocabulary V 2: for Each sentence x in the Training set T do 3: xq denotes the arbitrary word of a or o in x. 4: Parse x to obtain its Cs Ts: spa Cy(x). 5: for cst in set (spa Cy(x)) do 6: if xq in cst & cst not in V then 7: F(cst) = 0. initialize the frequency of cst 8: V = V {cst}. add cst to V 9: end if 10: Increment F(cst). count the frequency of cst 11: end for 12: end for 13: Sort V by F in descending order. 14: Calculate the IDF value of each cst in V : IDF(cst). 15: return V .
Open Source Code	Yes	Code https://github.com/jian-projects/simrp
Open Datasets	Yes	Experiments are carried out on two widely used ASQP datasets, i.e., Rest15 and Rest16, with their statistics shown in Table 1. These datasets are initially constructed based on the Sem Eval task (Pontiki et al. 2015, 2016) and have undergone multiple annotations (Peng et al. 2020a; Wan et al. 2020). Zhang et al. (2021a) aligned these datasets and served as the standard datasets for the ASQP task finally.
Dataset Splits	Yes	Dataset Rest15 Rest16 Train Valid Test Train Valid Test N 834 209 537 1264 316 544 Table 1: Dataset statistics for Rest15 and Rest16. N denotes the number of sentences.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or cloud platforms) are provided for running the experiments. The paper only mentions using pre-trained models like T5-large, GPT-4o, and Llama-3.1.
Software Dependencies	No	The paper mentions using the 'spaCy' tool to parse CTs, the 'Sentence-BERT model' for semantic vectors, and 'T5-large' as the pre-trained model backbone. However, no specific version numbers for these or other software libraries are provided.
Experiment Setup	Yes	The dimension of the syntactic vector d is set to 28 in our experiments. During the model training, the max epoch number is set to 10, and the batch size is set to 4 for all experiments. Adam W with an initial learning rate of 8e-5 is used as the optimizer, and linear scheduling is applied to adjust the learning rate. Setting k = 10 in the data preparation stage, i.e., we retrieve the top 10 syntactically similar demonstrations from the training set for each sentence. The numbers of concatenated demonstrations k are set to 5 and 1 for rest15 and rest16, respectively.