reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large Language Model-Brained GUI Agents: A Survey

Authors: Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a comprehensive survey of LLM-powered GUI agents, exploring their historical evolution, core components, and advanced techniques. We address critical research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of fine-tuned models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness.
Researcher Affiliation	Collaboration	1Microsoft 2Shanghai Artificial Intelligence Laboratory 3Peking University
Pseudocode	No	The paper describes methods and processes in prose but does not include any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.1 'Architecture and Workflow In a Nutshell' describes the workflow in natural language and a diagram, but not as pseudocode.
Open Source Code	No	The collection of papers reviewed in this survey will be hosted and regularly updated on the Git Hub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey. Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration. This refers to the papers reviewed in the survey, not the source code for any methodology presented in this survey paper itself.
Open Datasets	No	This paper is a survey and does not present new experimental results using a specific dataset that it makes available. While it discusses and references numerous datasets used by other papers (e.g., in Section 6 'Data for Optimizing LLM-Powered GUI Agents'), it does not provide access information for a dataset used in its own experimental work.
Dataset Splits	No	This paper is a survey and does not conduct its own experiments; therefore, it does not provide dataset split information for its own work. It discusses dataset splits in the context of other research papers, such as in Section 8.1 regarding 'Step Success Rate' and other metrics from benchmarks.
Hardware Specification	No	This paper is a survey and does not conduct its own experiments, nor does it describe a specific methodology that requires a hardware setup. Therefore, it does not provide details about the hardware used for running experiments.
Software Dependencies	No	This paper is a survey and does not describe a new methodology requiring specific software implementations. While it mentions various software tools and platforms (e.g., Selenium, Appium) in the context of other research, it does not list specific versioned software dependencies for its own work.
Experiment Setup	No	This paper is a survey and does not conduct its own experiments. Therefore, it does not contain specific experimental setup details such as hyperparameters or training configurations for its own work.