reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents

Authors: Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers.
Researcher Affiliation	Academia	1University of T ubingen. Now at Apple. 2Technical University of Munich..
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It discusses theoretical concepts and proposes new research directions.
Open Source Code	No	The paper is a position paper and does not describe a methodology for which source code would typically be released. There is no mention of open-source code or code availability.
Open Datasets	No	The paper discusses research findings that sometimes refer to datasets like 'Natural Questions' (Kwiatkowski et al., 2019a) or 'Image Net-1k' (Mucs anyi et al., 2024), but it does not present its own experimental results based on these or other datasets, nor does it provide access information for any dataset it directly uses or releases.
Dataset Splits	No	The paper is theoretical and does not conduct experiments or analyze data, therefore, no dataset split information is provided.
Hardware Specification	No	The paper is theoretical and does not involve experimental runs requiring specific hardware. Thus, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any implementation details that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is a position paper focusing on theoretical concepts and new research avenues, not on empirical experimentation. Therefore, no experimental setup details, hyperparameters, or training configurations are provided.