Steer LLM Latents for Hallucination Detection
Authors: Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, Yixuan Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that TSV achieves state-of-the-art performance with minimal labeled data, exhibiting strong generalization across datasets and providing a practical solution for real-world LLM applications. ... Extensive experiments demonstrate the strong performance of our method across diverse datasets. ... In Table 1, we compare TSV with competitive hallucination detection methods from the literature. ... 5.3. Ablation studies |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, University of Wisconsin Madison 2School of Software Technology, Zhejiang University. Correspondence to: Yixuan Li <EMAIL>. |
| Pseudocode | Yes | A Algorithms A.1. Overall training framework Algorithm 1 Overall training framework A.2. Sinkhorn algorithm Algorithm 2 Sinkhorn algorithm for entropic-regularized optimal transport |
| Open Source Code | Yes | Code is available at: https: //github.com/deeplearning-wisc/tsv. |
| Open Datasets | Yes | We evaluate our method on four generative question-answering (QA) tasks: three open-domain QA datasets Truthful QA (Lin et al., 2022a), Trivia QA (Joshi et al., 2017), and NQ Open (Kwiatkowski et al., 2019); and a domain-specific QA dataset Sci Q (Welbl et al., 2017). ... Truthful QA4 (Lin et al., 2022a), Trivia QA5 (Joshi et al., 2017), Sci Q6 (Welbl et al., 2017), and NQ Open7 (Kwiatkowski et al., 2019). |
| Dataset Splits | Yes | For evaluation, 25% of the QA pairs from each dataset are reserved for testing. Consistent with Du et al. (2024), 100 QA pairs are used for validation, while the remaining samples simulate the unlabeled training dataset. |
| Hardware Specification | Yes | We conducted all experiments using Python 3.8.15 and Py Torch 2.3.1 (Paszke et al., 2019) on NVIDIA A100 GPUs. |
| Software Dependencies | Yes | We conducted all experiments using Python 3.8.15 and Py Torch 2.3.1 (Paszke et al., 2019) on NVIDIA A100 GPUs. |
| Experiment Setup | Yes | Class prototypes µc and TSV v are randomly initialized, and trained in two stages: 20 epochs using only the exemplar set, followed by an additional 20 epochs after augmentation. Training is performed using the Adam W optimizer (Loshchilov, 2019), with a learning rate of 5e-03 and a batch size of 128. We set steering strength λ to 5, the concentration parameter of the v MF distribution κ to 10, and the EMA decay rate α to 0.99. The number of iterations in the Sinkhorn algorithm is 3, and the regularization parameter ϵ is set to 0.05, following Caron et al. (2020). |