MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines

Authors: Yaolun Zhang, Xiaogeng Liu, Chaowei Xiao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our framework, we conduct experiments on both text-based tasks and practical tasks. The results indicate that the generated multi-agent system surpasses other auto-designed methods and can achieve a comparable performance with the human-designed multi-agent system, which is optimized for those specific tasks. The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/.
Researcher Affiliation Academia 1University of Wisconsin Madison, Madison, US. Correspondence to: Chaowei Xiao <EMAIL>.
Pseudocode Yes Algorithm 1 FSM State Optimization Algorithm 2 Deployment Stage
Open Source Code Yes The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/.
Open Datasets Yes Firstly, we compare Meta Agent with other prompt-based methods on Trivial Creative Writing (Wang et al., 2024d) and GPQA (Rein et al., 2023). Machine Learning Bench(ml bench) (Hong et al., 2024a) is a benchmark that requires agents to train a machine-learning model for regression or classification.
Dataset Splits Yes # Load the dataset train_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_train.csv eval_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_eval.csv
Hardware Specification No The paper does not explicitly describe any specific hardware used to run its experiments, such as GPU or CPU models. It only mentions the foundation model used (GPT-4o).
Software Dependencies No The paper lists several software libraries used in the code example (pandas, sklearn, etc.) but does not provide specific version numbers for these dependencies, which is required for reproducibility.
Experiment Setup Yes We selected GPT-4o as the foundation model in the main experiments and set the temperature to 0 to ensure reproducibility. model = RandomForestClassifier(n_estimators=100, random_state=0) model = Pipeline(steps=[ (preprocessor, preprocessor), (classifier, RandomForestClassifier(random_state=42)) ])