Large Language Model-Brained GUI Agents: A Survey

Authors: Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents a comprehensive survey of LLM-powered GUI agents, exploring their historical evolution, core components, and advanced techniques. We address critical research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of fine-tuned models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness.
Researcher Affiliation Collaboration 1Microsoft 2Shanghai Artificial Intelligence Laboratory 3Peking University
Pseudocode No The paper describes methods and processes in prose but does not include any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.1 'Architecture and Workflow In a Nutshell' describes the workflow in natural language and a diagram, but not as pseudocode.
Open Source Code No The collection of papers reviewed in this survey will be hosted and regularly updated on the Git Hub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey. Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration. This refers to the papers reviewed in the survey, not the source code for any methodology presented in this survey paper itself.
Open Datasets No This paper is a survey and does not present new experimental results using a specific dataset that it makes available. While it discusses and references numerous datasets used by other papers (e.g., in Section 6 'Data for Optimizing LLM-Powered GUI Agents'), it does not provide access information for a dataset used in its own experimental work.
Dataset Splits No This paper is a survey and does not conduct its own experiments; therefore, it does not provide dataset split information for its own work. It discusses dataset splits in the context of other research papers, such as in Section 8.1 regarding 'Step Success Rate' and other metrics from benchmarks.
Hardware Specification No This paper is a survey and does not conduct its own experiments, nor does it describe a specific methodology that requires a hardware setup. Therefore, it does not provide details about the hardware used for running experiments.
Software Dependencies No This paper is a survey and does not describe a new methodology requiring specific software implementations. While it mentions various software tools and platforms (e.g., Selenium, Appium) in the context of other research, it does not list specific versioned software dependencies for its own work.
Experiment Setup No This paper is a survey and does not conduct its own experiments. Therefore, it does not contain specific experimental setup details such as hyperparameters or training configurations for its own work.