Flow: Modularized Agentic Workflow Automation

Authors: Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, Tongliang Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across various practical tasks demonstrate significant improvements in the efficiency of multi-agent frameworks through dynamic workflow refinement and modularization.
Researcher Affiliation Academia 1Sydney AI Centre, The University of Sydney, 2The University of Adelaide 3Carnegie Mellon University, 4Mohamed bin Zayed University of Artificial Intelligence
Pseudocode Yes D.2 Pseudocode for updating AOV Algorithm 1: Helper Function for Updating Graph Algorithm 2: Flow
Open Source Code Yes The code is available at: https://github.com/tmllab/2025_ICLR_FLOW.
Open Datasets No The paper describes experimental tasks like 'website design', 'La Te X Beamer writing', and 'gobang game development' which involve agent generation of content. It does not utilize or provide access to any predefined publicly available datasets in the traditional sense (e.g., benchmark datasets with specific links or citations).
Dataset Splits No The paper conducts experiments on generative tasks (website design, La Te X Beamer writing, gobang game development) and details the number of trials for each experiment (e.g., 'We conducted five trials'). However, it does not involve traditional datasets with train/test/validation splits, as the focus is on the performance of agents generating outputs for specific tasks, not on training models on pre-divided data.
Hardware Specification No The paper mentions that agents were 'empowered by GPT-4o-mini and GPT-3.5-Turbo (Open AI, 2024)' and discusses 'Time Cost of Different Baseline' using these models. However, it does not specify the underlying hardware (e.g., GPU models, CPU types, or cloud computing resources) used to run these LLM agents or the overall framework for the experiments.
Software Dependencies No The paper states that agents are 'empowered by GPT-4o-mini and GPT-3.5-Turbo (Open AI, 2024)', which are specific LLM models. However, it does not provide details on other ancillary software dependencies, such as programming language versions (e.g., Python 3.x), specific libraries (e.g., PyTorch, TensorFlow), or other frameworks with their version numbers required to reproduce the experimental environment.
Experiment Setup Yes We designed three diverse and engaging tasks to evaluate multi-agent collaboration frameworks: 1) website design, 2) La Te X Beamer writing, and 3) gobang game development. In all experiments, we compare Flow to existing multi-agent frameworks: (1) Auto Gen , (2) Camel , and (3) Meta GPT . In our experiments, we use agents empowered by GPT-4o-mini and GPT-3.5-Turbo (Open AI, 2024). A Sample Prompt for Initialization Pinit Prompt for Update Pupdate We conducted five trials and recorded the success scores.