Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis
Authors: Luan Zhang, Dandan Song, Zhijing Wu, Yuhang Tian, Changzhi Zhou, Jing Xu, Ziyi Yang, Shuhao Zhang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that MHAD outperforms existing hallucination detection methods across multiple LLMs, demonstrating superior effectiveness. (Abstract) ... Section 4 Experiments 4.1 Experiment setting Dataset and Metrics. We evaluate MHAD and other baselines on our proposed SOQHD dataset. Consistent with previous studies [Chen et al., 2024; Du et al., 2024], we use AUROC as the evaluation metric. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Beijing Institute of Technology, China 2School of Cyberspace Science and Technology, Beijing Institute of Technology, China 3School of Computer Science and Technology, Huazhong University of Science and Technology, China EMAIL, shuhao EMAIL |
| Pseudocode | No | The paper describes methods and processes using mathematical formulations (e.g., equations 1-12) and descriptive text, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Corresponding author Project: https://github.com/Z-Luan/DIRA-HD |
| Open Datasets | Yes | To evaluate MHAD thoroughly, we develop SOQHD (Sustainable Open-Domain QA Hallucination Detection), a novel benchmark for hallucination detection in ODQA. ... We also evaluate on the existing Halu Eval [Li et al., 2023] dataset. ... This step begins with the manual annotation of a small sample of questions from the development sets of Trivia QA [Joshi et al., 2017] and NQ [Kwiatkowski et al., 2019], which are widely used ODQA benchmarks. |
| Dataset Splits | Yes | The training set of SOQHD contains a total of 2000 questions, and the test set comprises 500 questions. ... For the hyperparameters α and top-k used for neuron and layer selection, the settings are determined using the separate validation set, which is a randomly sampled 20% subset from the SOQHD training set. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX A6000. |
| Software Dependencies | No | The paper mentions software components and optimizers such as Adam and ReLU activation function, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The MHAD classifier employs a 4-layer MLP for hallucination detection, with its input corresponding to the dimension of the hallucination awareness vector. The hidden layers have dimensions of 1024 and 128, respectively. The Re LU activation function is used between layers, with a dropout rate of 0.5. The classifier is optimized using Adam with a learning rate of 1e-5, a weight decay of 1e-2, and a training batch size of 64. For the hyperparameters α and top-k used for neuron and layer selection, the settings are determined using the separate validation set, which is a randomly sampled 20% subset from the SOQHD training set. |