Communicating Activations Between Language Model Agents
Authors: Vignav Ramesh, Kenneth Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method with various functional forms f on two experimental setups multi-player coordination games and reasoning benchmarks and find that it achieves up to 27.0% improvement over natural language communication across datasets with <1/4 the compute, illustrating the superiority and robustness of activations as an alternative language for communication between LMs. |
| Researcher Affiliation | Academia | 1Kempner Institute for AI, Harvard University, Cambridge, MA, USA. Correspondence to: Vignav Ramesh <EMAIL>. |
| Pseudocode | No | The paper describes the method and procedure in prose and uses Figure 1 for illustration, but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link to source code, nor does it contain an explicit statement about the release of code in supplementary materials or other repositories. |
| Open Datasets | Yes | We validate our method by testing this approach with various functional forms f on two experimental setups: two multiplayer coordination games... and seven reasoning benchmarks spanning multiple domains: Biographies (Du et al., 2023), GSM8k (Cobbe et al., 2021), MMLU High School Psychology, MMLU Formal Logic, MMLU College Biology, MMLU Professional Law, and MMLU Public Relations (Hendrycks et al., 2021). |
| Dataset Splits | Yes | We evaluate on a randomly-sampled size-100 subset of each dataset. ... Indeed, we verify this hypothesis by training W on the GSM8k train set (to produce Win dist) and then evaluating with this task-specific linear layer on the GSM8k test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software, libraries, or programming languages used in the implementation. |
| Experiment Setup | Yes | Across all experiment configurations, we fix the decoding strategy to nucleus sampling with p = 0.9. ... In experiments involving the mapping matrix W , we instantiate W R4096 3072 using Xavier initialization and train for 10 epochs on a dataset of 3072 sentences... We use batch size 32 and the Adam optimizer with learning rate 0.001. |