Position: We Need An Algorithmic Understanding of Generative AI

Authors: Oliver Eberle, Thomas Austin Mcgee, Hamza Giaffar, Taylor Whittington Webb, Ida Momennejad

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To ground our position in an empirical example, we conducted a case study focused on LLMs, which have been shown to perform poorly on graph navigation and multistep planning tasks (Momennejad et al., 2023). In cases where they do succeed, it remains unclear how they solve these problems, e.g., whether they implement classic search algorithms or use other strategies. To address this question, we studied the algorithms used by widely used LLMs, instruction-tuned Llama-3.1 with 8B and 70B parameters, in the context of graph navigation.
Researcher Affiliation Collaboration 1Technische Universit at Berlin, Berlin, Germany 2BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany 3University of California Los Angeles, Los Angeles, USA 4Halıcıo glu Data Science Institute, University of California San Diego, San Diego, USA 5Microsoft Research NYC, New York, USA.
Pseudocode No The paper discusses algorithmic concepts and methodologies but does not provide any structured pseudocode or algorithm blocks. Figure 1 shows a conceptual flow diagram, not pseudocode.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No The paper describes a case study using 'instruction-tuned Llama-3.1 with 8B and 70B parameters' and a 'simple tree graph structure, presented in a prompt'. While Llama models are generally available, the paper does not provide concrete access information (link, citation for dataset) for any specific dataset used in their experiments, beyond describing the task and the prompt itself.
Dataset Splits No The paper describes a case study involving the analysis of pre-trained LLMs on a specific graph navigation task defined by a prompt. It does not involve traditional dataset training, validation, and testing splits, and therefore no such split information is provided.
Hardware Specification No The paper mentions using Llama-3.1-8B and 70B models for experiments but does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running these analyses.
Software Dependencies Yes Mixed-effects modeling was conducted using the lmerTest package in R.
Experiment Setup Yes Prompt. We introduce the model to a two-step tree graph following the prompt from Momennejad et al. (2023), which demonstrated that LLMs struggle with graph navigation and especially tree search. The model is tasked with determining the validity of a given path, producing a single token output: yes or no . The full prompt and task for starting from the lobby and goal location W are shown in Figure 3a. We next present results on Llama-3.1-8B with additional analyses of the 70B model presented in Appendix A.4.