Position: We Need An Algorithmic Understanding of Generative AI
Authors: Oliver Eberle, Thomas Austin Mcgee, Hamza Giaffar, Taylor Whittington Webb, Ida Momennejad
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To ground our position in an empirical example, we conducted a case study focused on LLMs, which have been shown to perform poorly on graph navigation and multistep planning tasks (Momennejad et al., 2023). In cases where they do succeed, it remains unclear how they solve these problems, e.g., whether they implement classic search algorithms or use other strategies. To address this question, we studied the algorithms used by widely used LLMs, instruction-tuned Llama-3.1 with 8B and 70B parameters, in the context of graph navigation. |
| Researcher Affiliation | Collaboration | 1Technische Universit at Berlin, Berlin, Germany 2BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany 3University of California Los Angeles, Los Angeles, USA 4Halıcıo glu Data Science Institute, University of California San Diego, San Diego, USA 5Microsoft Research NYC, New York, USA. |
| Pseudocode | No | The paper discusses algorithmic concepts and methodologies but does not provide any structured pseudocode or algorithm blocks. Figure 1 shows a conceptual flow diagram, not pseudocode. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper describes a case study using 'instruction-tuned Llama-3.1 with 8B and 70B parameters' and a 'simple tree graph structure, presented in a prompt'. While Llama models are generally available, the paper does not provide concrete access information (link, citation for dataset) for any specific dataset used in their experiments, beyond describing the task and the prompt itself. |
| Dataset Splits | No | The paper describes a case study involving the analysis of pre-trained LLMs on a specific graph navigation task defined by a prompt. It does not involve traditional dataset training, validation, and testing splits, and therefore no such split information is provided. |
| Hardware Specification | No | The paper mentions using Llama-3.1-8B and 70B models for experiments but does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running these analyses. |
| Software Dependencies | Yes | Mixed-effects modeling was conducted using the lmerTest package in R. |
| Experiment Setup | Yes | Prompt. We introduce the model to a two-step tree graph following the prompt from Momennejad et al. (2023), which demonstrated that LLMs struggle with graph navigation and especially tree search. The model is tasked with determining the validity of a given path, producing a single token output: yes or no . The full prompt and task for starting from the lobby and goal location W are shown in Figure 3a. We next present results on Llama-3.1-8B with additional analyses of the 70B model presented in Appendix A.4. |