Large Language Model-Brained GUI Agents: A Survey
Authors: Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a comprehensive survey of LLM-powered GUI agents, exploring their historical evolution, core components, and advanced techniques. We address critical research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of fine-tuned models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. |
| Researcher Affiliation | Collaboration | 1Microsoft 2Shanghai Artificial Intelligence Laboratory 3Peking University |
| Pseudocode | No | The paper describes methods and processes in prose but does not include any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.1 'Architecture and Workflow In a Nutshell' describes the workflow in natural language and a diagram, but not as pseudocode. |
| Open Source Code | No | The collection of papers reviewed in this survey will be hosted and regularly updated on the Git Hub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey. Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration. This refers to the papers reviewed in the survey, not the source code for any methodology presented in this survey paper itself. |
| Open Datasets | No | This paper is a survey and does not present new experimental results using a specific dataset that it makes available. While it discusses and references numerous datasets used by other papers (e.g., in Section 6 'Data for Optimizing LLM-Powered GUI Agents'), it does not provide access information for a dataset used in its own experimental work. |
| Dataset Splits | No | This paper is a survey and does not conduct its own experiments; therefore, it does not provide dataset split information for its own work. It discusses dataset splits in the context of other research papers, such as in Section 8.1 regarding 'Step Success Rate' and other metrics from benchmarks. |
| Hardware Specification | No | This paper is a survey and does not conduct its own experiments, nor does it describe a specific methodology that requires a hardware setup. Therefore, it does not provide details about the hardware used for running experiments. |
| Software Dependencies | No | This paper is a survey and does not describe a new methodology requiring specific software implementations. While it mentions various software tools and platforms (e.g., Selenium, Appium) in the context of other research, it does not list specific versioned software dependencies for its own work. |
| Experiment Setup | No | This paper is a survey and does not conduct its own experiments. Therefore, it does not contain specific experimental setup details such as hyperparameters or training configurations for its own work. |