reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis

Authors: Luan Zhang, Dandan Song, Zhijing Wu, Yuhang Tian, Changzhi Zhou, Jing Xu, Ziyi Yang, Shuhao Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that MHAD outperforms existing hallucination detection methods across multiple LLMs, demonstrating superior effectiveness. (Abstract) ... Section 4 Experiments 4.1 Experiment setting Dataset and Metrics. We evaluate MHAD and other baselines on our proposed SOQHD dataset. Consistent with previous studies [Chen et al., 2024; Du et al., 2024], we use AUROC as the evaluation metric.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Beijing Institute of Technology, China 2School of Cyberspace Science and Technology, Beijing Institute of Technology, China 3School of Computer Science and Technology, Huazhong University of Science and Technology, China EMAIL, shuhao EMAIL
Pseudocode	No	The paper describes methods and processes using mathematical formulations (e.g., equations 1-12) and descriptive text, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Corresponding author Project: https://github.com/Z-Luan/DIRA-HD
Open Datasets	Yes	To evaluate MHAD thoroughly, we develop SOQHD (Sustainable Open-Domain QA Hallucination Detection), a novel benchmark for hallucination detection in ODQA. ... We also evaluate on the existing Halu Eval [Li et al., 2023] dataset. ... This step begins with the manual annotation of a small sample of questions from the development sets of Trivia QA [Joshi et al., 2017] and NQ [Kwiatkowski et al., 2019], which are widely used ODQA benchmarks.
Dataset Splits	Yes	The training set of SOQHD contains a total of 2000 questions, and the test set comprises 500 questions. ... For the hyperparameters α and top-k used for neuron and layer selection, the settings are determined using the separate validation set, which is a randomly sampled 20% subset from the SOQHD training set.
Hardware Specification	Yes	All experiments are conducted on a single RTX A6000.
Software Dependencies	No	The paper mentions software components and optimizers such as Adam and ReLU activation function, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The MHAD classifier employs a 4-layer MLP for hallucination detection, with its input corresponding to the dimension of the hallucination awareness vector. The hidden layers have dimensions of 1024 and 128, respectively. The Re LU activation function is used between layers, with a dropout rate of 0.5. The classifier is optimized using Adam with a learning rate of 1e-5, a weight decay of 1e-2, and a training batch size of 64. For the hyperparameters α and top-k used for neuron and layer selection, the settings are determined using the separate validation set, which is a randomly sampled 20% subset from the SOQHD training set.