reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Descriptive and Discriminative Document Identifiers for Generative Retrieval

Authors: Jiehan Cheng, Zhicheng Dou, Yutao Zhu, Xiaoxi Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on the MS MARCO and NQ320k dataset illustrate the effectiveness of the approach.
Researcher Affiliation	Academia	Gaoling School of Artificial Intelligence, Renmin University of China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual descriptions and mathematical equations, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We experiment on two widely recognized datasets: MS MARCO (Bajaj et al. 2016) and Natural Questions (NQ) (Kwiatkowski et al. 2019).
Dataset Splits	Yes	Following NOVO (Wang et al. 2023), we eliminate duplicate documents in NQ based on document titles and use the training set and the validation set divided in NQ as our training set and testing set. ... and use the training set and the dev set divided in MS MARCO as our training set and testing set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or other computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'T5-base' as the base model and 'nltk' for n-gram processing, but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	On MS300k, we choose similarity threshold λ1 = 0.99, MRR threshold λ2 = 0.1 to improve the diversity of the synthetic queries so as to reflect the document from multiple perspectives, while on NQ320k, we set similarity threshold λ1 = 0.99, MRR threshold λ2 = 0.6 to improve the retrieval performance of the query. We choose the number of n-grams ng = 3 to compose the Doc IDs on both datasets.