MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Authors: Yaolun Zhang, Xiaogeng Liu, Chaowei Xiao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our framework, we conduct experiments on both text-based tasks and practical tasks. The results indicate that the generated multi-agent system surpasses other auto-designed methods and can achieve a comparable performance with the human-designed multi-agent system, which is optimized for those specific tasks. The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/. |
| Researcher Affiliation | Academia | 1University of Wisconsin Madison, Madison, US. Correspondence to: Chaowei Xiao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 FSM State Optimization Algorithm 2 Deployment Stage |
| Open Source Code | Yes | The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/. |
| Open Datasets | Yes | Firstly, we compare Meta Agent with other prompt-based methods on Trivial Creative Writing (Wang et al., 2024d) and GPQA (Rein et al., 2023). Machine Learning Bench(ml bench) (Hong et al., 2024a) is a benchmark that requires agents to train a machine-learning model for regression or classification. |
| Dataset Splits | Yes | # Load the dataset train_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_train.csv eval_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_eval.csv |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used to run its experiments, such as GPU or CPU models. It only mentions the foundation model used (GPT-4o). |
| Software Dependencies | No | The paper lists several software libraries used in the code example (pandas, sklearn, etc.) but does not provide specific version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We selected GPT-4o as the foundation model in the main experiments and set the temperature to 0 to ensure reproducibility. model = RandomForestClassifier(n_estimators=100, random_state=0) model = Pipeline(steps=[ (preprocessor, preprocessor), (classifier, RandomForestClassifier(random_state=42)) ]) |