reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Survey on Out-of-Distribution Detection in NLP

Authors: Hao Lang, Yinhe Zheng, Yixuan Li, Jian SUN, Fei Huang, Yongbin Li

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this survey, we provide a comprehensive review of OOD detection methods in NLP. We formalize the OOD detection tasks and identify the major challenges of OOD detection in NLP. A taxonomy of existing OOD detection methods is also provided. We hope this survey helps researchers locate their target problems and find the most suitable datasets, metrics, and baselines. Moreover, we also provide some promising directions that can inspire future research and exploration. Finally, we do not present any new empirical results. It would be helpful to perform comparative experiments over different OOD detection methods (Yang et al., 2022). We leave this as future work.
Researcher Affiliation	Collaboration	Hao Lang EMAIL Alibaba Group, Yinhe Zheng EMAIL Alibaba Group, Yixuan Li EMAIL Department of Computer Sciences, University of Wisconsin-Madison, Jian Sun EMAIL Alibaba Group, Fei Huang EMAIL Alibaba Group, Yongbin Li EMAIL Alibaba Group
Pseudocode	No	The paper describes methodologies in prose and through a taxonomy diagram, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper is a survey and does not present new empirical results or a novel methodology that would typically involve a dedicated code release. There are no explicit statements about releasing code or links to a code repository for the work described in this paper.
Open Datasets	Yes	CLINIC150 (Larson et al., 2019), Banking (Casanueva et al., 2020), Stack Overflow (Xu et al., 2015), STAR (Mosig et al., 2020), ROSTD (Gangal et al., 2020) are mentioned and cited in Appendix B, providing specific references to publicly available datasets that are commonly used in the field. These are standard academic datasets with proper attribution.
Dataset Splits	No	The paper is a survey and does not conduct new experiments requiring specific dataset splits for reproduction. It mentions various datasets and their characteristics but does not provide split information for its own work.
Hardware Specification	No	The paper is a survey and explicitly states, 'Finally, we do not present any new empirical results.' Therefore, no hardware specifications for running experiments are provided.
Software Dependencies	No	The paper is a survey and does not implement a new methodology. It discusses various existing software and models but does not specify software dependencies with version numbers for its own contribution.
Experiment Setup	No	The paper is a survey and explicitly states, 'Finally, we do not present any new empirical results.' Therefore, no experimental setup details like hyperparameters or training configurations are provided for the work described in this paper.