Navigating Social Dilemmas with LLM-based Agents via Consideration of Future Consequences
Authors: Dung Nguyen, Hung Le, Kien Do, Sunil Gupta, Svetha Venkatesh, Truyen Tran
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our first set of experiments, where LLM is directly asked to make decisions, shows that agents considering future consequences exhibit sustainable behaviour and achieve high common rewards for the population. Extensive experiments in complex environments showed that the CFC-Agent can manage a sequence of calls to LLM for reasoning and engaging in communication to cooperate with others to resolve the common dilemma better. Finally, our analysis showed that considering future consequences not only affects the final decision but also improves the conversations between LLM-based agents toward a better resolution of social dilemmas. |
| Researcher Affiliation | Academia | Applied Artificial Intelligence Initiative (A2I2), Deakin University EMAIL |
| Pseudocode | No | The paper describes methods and a framework, but does not include any clearly labeled pseudocode or algorithm blocks. It provides conceptual diagrams (Figure 1 and 2) to illustrate the agent's structure. |
| Open Source Code | No | The paper states: "The underlying LLMs are open-source large language models: (1) LLAMA-3.1-70B-it [Dubey et al., 2024]; (2) Qwen-2.5-72B-it [Team, 2024]." This refers to models used by the authors, not code released by them for their own methodology. |
| Open Datasets | No | The paper conducts experiments in "Common Harvest" and "Gov Sim environments [Piatti et al., 2024]" which are described as types of game environments. However, it does not provide concrete access information (e.g., specific links, DOIs, repositories, or formal citations for dataset download) for any specific dataset used or generated during their experiments. |
| Dataset Splits | No | The paper mentions experimental settings such as "two-player setting (20 runs)", "setting with 9 agents", and "maximum timestep of the game is 200". These are parameters for the simulation environment, not explicit training/test/validation dataset splits of pre-existing data. |
| Hardware Specification | No | The paper mentions using specific LLMs like "LLAMA-3.1-70B-it" and "Qwen-2.5-72B-it", implying the use of computational hardware, but it does not specify any particular GPU models, CPU types, or other hardware details used for running the experiments. |
| Software Dependencies | No | The paper states: "The underlying LLMs are open-source large language models: (1) LLAMA-3.1-70B-it [Dubey et al., 2024]; (2) Qwen-2.5-72B-it [Team, 2024]." These are the models being used/studied, not ancillary software components or libraries with specific version numbers (e.g., Python, PyTorch, CUDA) required to replicate the experimental environment. |
| Experiment Setup | Yes | In the Common Harvests environment, the agents have a memory with the size H = 5, i.e., they can remember 5 most recent experiences to make decisions; and all agents are augmented with rationale. ... We identified this range for LLAMA-3.1-70B-it (αLLAMA CFC [−0.6, 0.4]) and Qwen-2.5-72b-it (αQwen CFC [−5.0, 5.0)) under intervening over layers l [20, 60]. |