Advantage Alignment Algorithms

Authors: Juan Duque, Milad Aghajohari, Timotheus Cooijmans, Razvan Ciuca, Tianyu Zhang, Gauthier Gidel, Aaron Courville

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our algorithms across a range of social dilemmas, achieving state-of-the-art cooperation and robustness against exploitation. [...] We apply PAA to the Commons Harvest Open environment in Melting Pot 2.0 (Agapiou et al., 2023), a high dimensional version of the tragedy of the commons social dilemma, achieving state-of-the-art results and showcasing the scalability and effectiveness of our methods.
Researcher Affiliation Academia University of Montreal & Mila EMAIL
Pseudocode Yes Algorithm 1 Advantage Alignment; Algorithm 2 Proximal Advantage Alignment
Open Source Code Yes Please refer to the code released with this paper for the exact implementation.
Open Datasets Yes We apply PAA to the Commons Harvest Open environment in Melting Pot 2.0 (Agapiou et al., 2023) [...] We consider the full history version of IPD, where a gated recurrent unit (GRU) policy conditions on the full trajectory of observations before sampling an action. [...] The Coin Game is a 3x3 grid world environment where two agents, red and blue, take turns collecting coins. [...] In the original Negotiation Game, two agents bargain over n types of items over multiple rounds.
Dataset Splits No The paper describes simulation environments and training procedures (e.g., 'trajectories of length 16', '50 episodes of length 16', 'train a GTr XL transformer for 34k steps') but does not provide specific train/test/validation splits for any static dataset.
Hardware Specification Yes All of our IPD experiments run in 50 minutes in a nvidia A100 gpu. [...] All of our Coin Game experiments run in 30 minutes in a nvidia A100 gpu. [...] All of our Negotiation Game experiments run in 1 hour on a nvidia A100 gpu. [...] In total, our Commons Harvest Open experiments last 24 hours on an nvidia L40s gpu.
Software Dependencies No The paper mentions software components and algorithms like Adam, PPO, GRU, GTr XL transformer, but does not provide specific version numbers for any libraries or dependencies (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Table 1: IPD Hyperparameters; Table 2: Coin Game Hyperparameters; Table 3: Negotiation Game Hyperparameters; Table 4: Commons Harvest Open Hyperparameters. These tables list specific values for parameters such as Learning Rate, Batch Size, Discount Factor, Entropy Beta, etc.