Navigating Social Dilemmas with LLM-based Agents via Consideration of Future Consequences

Authors: Dung Nguyen, Hung Le, Kien Do, Sunil Gupta, Svetha Venkatesh, Truyen Tran

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our first set of experiments, where LLM is directly asked to make decisions, shows that agents considering future consequences exhibit sustainable behaviour and achieve high common rewards for the population. Extensive experiments in complex environments showed that the CFC-Agent can manage a sequence of calls to LLM for reasoning and engaging in communication to cooperate with others to resolve the common dilemma better. Finally, our analysis showed that considering future consequences not only affects the final decision but also improves the conversations between LLM-based agents toward a better resolution of social dilemmas.
Researcher Affiliation Academia Applied Artificial Intelligence Initiative (A2I2), Deakin University EMAIL
Pseudocode No The paper describes methods and a framework, but does not include any clearly labeled pseudocode or algorithm blocks. It provides conceptual diagrams (Figure 1 and 2) to illustrate the agent's structure.
Open Source Code No The paper states: "The underlying LLMs are open-source large language models: (1) LLAMA-3.1-70B-it [Dubey et al., 2024]; (2) Qwen-2.5-72B-it [Team, 2024]." This refers to models used by the authors, not code released by them for their own methodology.
Open Datasets No The paper conducts experiments in "Common Harvest" and "Gov Sim environments [Piatti et al., 2024]" which are described as types of game environments. However, it does not provide concrete access information (e.g., specific links, DOIs, repositories, or formal citations for dataset download) for any specific dataset used or generated during their experiments.
Dataset Splits No The paper mentions experimental settings such as "two-player setting (20 runs)", "setting with 9 agents", and "maximum timestep of the game is 200". These are parameters for the simulation environment, not explicit training/test/validation dataset splits of pre-existing data.
Hardware Specification No The paper mentions using specific LLMs like "LLAMA-3.1-70B-it" and "Qwen-2.5-72B-it", implying the use of computational hardware, but it does not specify any particular GPU models, CPU types, or other hardware details used for running the experiments.
Software Dependencies No The paper states: "The underlying LLMs are open-source large language models: (1) LLAMA-3.1-70B-it [Dubey et al., 2024]; (2) Qwen-2.5-72B-it [Team, 2024]." These are the models being used/studied, not ancillary software components or libraries with specific version numbers (e.g., Python, PyTorch, CUDA) required to replicate the experimental environment.
Experiment Setup Yes In the Common Harvests environment, the agents have a memory with the size H = 5, i.e., they can remember 5 most recent experiences to make decisions; and all agents are augmented with rationale. ... We identified this range for LLAMA-3.1-70B-it (αLLAMA CFC [−0.6, 0.4]) and Qwen-2.5-72b-it (αQwen CFC [−5.0, 5.0)) under intervening over layers l [20, 60].