reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Agentic Large Language Models, a Survey

Authors: Aske Plaat, Max van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, Kees Joost Batenburg

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Objectives: We review the growing body of work in this area and provide a research agenda. Methods: Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. Results: The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. Conclusions: We discuss applications of agentic LLMs and provide an agenda for further research.
Researcher Affiliation	Academia	ASKE PLAAT , Leiden University, Netherlands MAX VAN DUIJN, Leiden University, Netherlands NIKI VAN STEIN, Leiden University, Netherlands MIKE PREUSS, Leiden University, Netherlands PETER VAN DER PUTTEN, Leiden University & AI Lab, Pegasystems, Netherlands KEES JOOST BATENBURG, Leiden University, Netherlands
Pseudocode	Yes	Figure 9 provides pseudo-code for the algorithm, in which the three calls to the LLM are clearly shown.
Open Source Code	No	The paper is a survey and does not provide explicit open-source code for its own methodology. It mentions open-source models and tools that others have developed.
Open Datasets	No	This paper is a survey of existing literature and does not introduce or provide access to a new dataset for its own methodology. It discusses various datasets and benchmarks used by the papers it surveys.
Dataset Splits	No	This paper is a survey and does not conduct experiments with dataset splits. It describes methodologies and findings from other research papers.
Hardware Specification	No	This paper is a survey and does not describe the specific hardware used to run its own experiments. It mentions hardware in the context of other surveyed papers, but not for its own methodology.
Software Dependencies	No	This paper is a survey and does not specify software dependencies with version numbers for its own methodology. It discusses various software tools and frameworks used in the surveyed literature.
Experiment Setup	No	This paper is a survey and does not detail an experimental setup with hyperparameters or system-level training settings for its own methodology. Its methodology is primarily literature review and taxonomy creation.