ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Authors: Andrew Estornell, Jean-Francois Ton, Yuanshun Yao, Yang Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that ACC-Collab outperforms Sot A multi-agent techniques on a wide array of benchmarks. 5 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Andrew Estornell Byte Dance Research EMAIL Jean-Franc ois Ton Byte Dance Research EMAIL Yuanshun Yao Meta Gen AI EMAIL Yang Liu University of California, Santa Cruz EMAIL |
| Pseudocode | Yes | Algorithm 1: Trajectory generation and selection Data: Actor and critic: θa, θc, Distribution of tasks D , Reward threshold ε Result: A dataset of trajectories D D |
| Open Source Code | Yes | Code available at https://github.com/LlenRotse/ACC-Collab |
| Open Datasets | Yes | Benchmarks To evaluate the efficacy of ACC-Collab we make use of 5 standard benchmark tasks: Bool Q Clark et al. (2019) 12k yes-no reading comprehension questions, MMLU Hendrycks et al. (2020) 15k multiple choice questions covering a wide array of subjects and difficulty, BBH Suzgun et al. (2022) 5k mixed-type questions SCIQ Welbl et al. (2017) 13k multiple-choice science questions, ARC Chollet (2019) 7k multiple-choice reasoning-based questions. |
| Dataset Splits | Yes | Each dataset is split into a training set, a validation set, and a testing set. For datasets that come with an explicit partition of these sets we use the given partitions; this includes Bool Q, MMLU, SCIQ, and ARC. For BBH, we randomly sample roughly 25% and 10% of the questions from each category in BBH to create a test and validation set, respectively; this comes out to 1260 questions for the test set and 500 questions for the validation set. All results are reported on questions in the test set. |
| Hardware Specification | Yes | Compute All training was performed on a single Nvidia-H800 GPU. Inference for Llama-3 and Mistral based models is performed on a single Nvidia-v100 GPU, for Gemma-2 based models we used a single Nvidia-H800. |
| Software Dependencies | No | The paper mentions "VLLM library" and "trl library" but does not specify their version numbers, which are required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Training for all models was performed via the trl library, using Lo RAs of size 256. When training ACC-Collab with DPO we use a negative log-likelihood (NLL) regularization term (with weight 1) as outlined in Pang et al. (2024a). |