ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Authors: Andrew Estornell, Jean-Francois Ton, Yuanshun Yao, Yang Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that ACC-Collab outperforms Sot A multi-agent techniques on a wide array of benchmarks. 5 EXPERIMENTS
Researcher Affiliation Collaboration Andrew Estornell Byte Dance Research EMAIL Jean-Franc ois Ton Byte Dance Research EMAIL Yuanshun Yao Meta Gen AI EMAIL Yang Liu University of California, Santa Cruz EMAIL
Pseudocode Yes Algorithm 1: Trajectory generation and selection Data: Actor and critic: θa, θc, Distribution of tasks D , Reward threshold ε Result: A dataset of trajectories D D
Open Source Code Yes Code available at https://github.com/LlenRotse/ACC-Collab
Open Datasets Yes Benchmarks To evaluate the efficacy of ACC-Collab we make use of 5 standard benchmark tasks: Bool Q Clark et al. (2019) 12k yes-no reading comprehension questions, MMLU Hendrycks et al. (2020) 15k multiple choice questions covering a wide array of subjects and difficulty, BBH Suzgun et al. (2022) 5k mixed-type questions SCIQ Welbl et al. (2017) 13k multiple-choice science questions, ARC Chollet (2019) 7k multiple-choice reasoning-based questions.
Dataset Splits Yes Each dataset is split into a training set, a validation set, and a testing set. For datasets that come with an explicit partition of these sets we use the given partitions; this includes Bool Q, MMLU, SCIQ, and ARC. For BBH, we randomly sample roughly 25% and 10% of the questions from each category in BBH to create a test and validation set, respectively; this comes out to 1260 questions for the test set and 500 questions for the validation set. All results are reported on questions in the test set.
Hardware Specification Yes Compute All training was performed on a single Nvidia-H800 GPU. Inference for Llama-3 and Mistral based models is performed on a single Nvidia-v100 GPU, for Gemma-2 based models we used a single Nvidia-H800.
Software Dependencies No The paper mentions "VLLM library" and "trl library" but does not specify their version numbers, which are required for a reproducible description of ancillary software.
Experiment Setup Yes Training for all models was performed via the trl library, using Lo RAs of size 256. When training ACC-Collab with DPO we use a negative log-likelihood (NLL) regularization term (with weight 1) as outlined in Pang et al. (2024a).