A Survey on the Honesty of Large Language Models
Authors: Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie Zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To address the aforementioned challenges and promote further research on the honesty of LLMs, we provide an extensive overview of current studies in this area. Figure 1 shows the outline of this survey. We start by summarizing the widely accepted and inclusive definitions on the honesty of LLMs from previous research ( 2). Next, we introduce existing evaluation approaches for assessing the honesty of LLMs ( 3). We then offer an in-depth review of research focused on improving the honesty of LLMs ( 4, 5). Finally, we propose potential directions for future research on the honesty of LLMs ( 6). |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2The University of Hong Kong 3Tsinghua University 4University of Illinois at Urbana-Champaign 5University of Virginia 6Peking University 7We Chat AI |
| Pseudocode | No | The paper is a survey and outlines concepts, definitions, evaluation approaches, and improvement strategies for LLM honesty. It does not present any novel algorithms or procedures in pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | No | The paper states: "We will constantly update the related research at https://github.com/Siheng Li99/LLM-Honesty-Survey." This link is for updating related research for the survey itself, not for providing source code of a specific experimental methodology developed within this paper. |
| Open Datasets | Yes | Representative benchmarks in this approach include Self Aware (Yin et al., 2023), KUQ (Amayuelas et al., 2023), Unknown Bench (Liu et al., 2024a), Hone Set (Gao et al., 2024) and Be Honest (Chern et al., 2024). These benchmarks generally assume that the model s pre-training corpus forms its knowledge base. For example, Yin et al. (2023) consider Wikipedia as part of the model s known knowledge as it is often included in pretraining data. Therefore, questions sourced from Wikipedia, such as SQu AD (Rajpurkar et al., 2016), can be treated as known questions. |
| Dataset Splits | No | The paper is a survey of existing research and does not conduct its own experiments or define new datasets or dataset splits for reproduction. It references various existing datasets and benchmarks but does not specify how they should be split for new experiments. |
| Hardware Specification | No | The paper is a survey and does not report on any new experimental results or specify the hardware used for any experiments conducted by the authors. |
| Software Dependencies | No | The paper is a survey and does not describe a specific experimental setup requiring software dependencies with version numbers. |
| Experiment Setup | No | The paper is a survey and does not include details of an experimental setup, such as hyperparameters or training configurations, as it does not present new experimental results. |