Steer LLM Latents for Hallucination Detection

Authors: Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, Yixuan Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that TSV achieves state-of-the-art performance with minimal labeled data, exhibiting strong generalization across datasets and providing a practical solution for real-world LLM applications. ... Extensive experiments demonstrate the strong performance of our method across diverse datasets. ... In Table 1, we compare TSV with competitive hallucination detection methods from the literature. ... 5.3. Ablation studies
Researcher Affiliation Academia 1Department of Computer Sciences, University of Wisconsin Madison 2School of Software Technology, Zhejiang University. Correspondence to: Yixuan Li <EMAIL>.
Pseudocode Yes A Algorithms A.1. Overall training framework Algorithm 1 Overall training framework A.2. Sinkhorn algorithm Algorithm 2 Sinkhorn algorithm for entropic-regularized optimal transport
Open Source Code Yes Code is available at: https: //github.com/deeplearning-wisc/tsv.
Open Datasets Yes We evaluate our method on four generative question-answering (QA) tasks: three open-domain QA datasets Truthful QA (Lin et al., 2022a), Trivia QA (Joshi et al., 2017), and NQ Open (Kwiatkowski et al., 2019); and a domain-specific QA dataset Sci Q (Welbl et al., 2017). ... Truthful QA4 (Lin et al., 2022a), Trivia QA5 (Joshi et al., 2017), Sci Q6 (Welbl et al., 2017), and NQ Open7 (Kwiatkowski et al., 2019).
Dataset Splits Yes For evaluation, 25% of the QA pairs from each dataset are reserved for testing. Consistent with Du et al. (2024), 100 QA pairs are used for validation, while the remaining samples simulate the unlabeled training dataset.
Hardware Specification Yes We conducted all experiments using Python 3.8.15 and Py Torch 2.3.1 (Paszke et al., 2019) on NVIDIA A100 GPUs.
Software Dependencies Yes We conducted all experiments using Python 3.8.15 and Py Torch 2.3.1 (Paszke et al., 2019) on NVIDIA A100 GPUs.
Experiment Setup Yes Class prototypes µc and TSV v are randomly initialized, and trained in two stages: 20 epochs using only the exemplar set, followed by an additional 20 epochs after augmentation. Training is performed using the Adam W optimizer (Loshchilov, 2019), with a learning rate of 5e-03 and a batch size of 128. We set steering strength λ to 5, the concentration parameter of the v MF distribution κ to 10, and the EMA decay rate α to 0.99. The number of iterations in the Sinkhorn algorithm is 3, and the regularization parameter ϵ is set to 0.05, following Caron et al. (2020).