reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Advantage Alignment Algorithms

Authors: Juan Duque, Milad Aghajohari, Timotheus Cooijmans, Razvan Ciuca, Tianyu Zhang, Gauthier Gidel, Aaron Courville

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our algorithms across a range of social dilemmas, achieving state-of-the-art cooperation and robustness against exploitation. [...] We apply PAA to the Commons Harvest Open environment in Melting Pot 2.0 (Agapiou et al., 2023), a high dimensional version of the tragedy of the commons social dilemma, achieving state-of-the-art results and showcasing the scalability and effectiveness of our methods.
Researcher Affiliation	Academia	University of Montreal & Mila EMAIL
Pseudocode	Yes	Algorithm 1 Advantage Alignment; Algorithm 2 Proximal Advantage Alignment
Open Source Code	Yes	Please refer to the code released with this paper for the exact implementation.
Open Datasets	Yes	We apply PAA to the Commons Harvest Open environment in Melting Pot 2.0 (Agapiou et al., 2023) [...] We consider the full history version of IPD, where a gated recurrent unit (GRU) policy conditions on the full trajectory of observations before sampling an action. [...] The Coin Game is a 3x3 grid world environment where two agents, red and blue, take turns collecting coins. [...] In the original Negotiation Game, two agents bargain over n types of items over multiple rounds.
Dataset Splits	No	The paper describes simulation environments and training procedures (e.g., 'trajectories of length 16', '50 episodes of length 16', 'train a GTr XL transformer for 34k steps') but does not provide specific train/test/validation splits for any static dataset.
Hardware Specification	Yes	All of our IPD experiments run in 50 minutes in a nvidia A100 gpu. [...] All of our Coin Game experiments run in 30 minutes in a nvidia A100 gpu. [...] All of our Negotiation Game experiments run in 1 hour on a nvidia A100 gpu. [...] In total, our Commons Harvest Open experiments last 24 hours on an nvidia L40s gpu.
Software Dependencies	No	The paper mentions software components and algorithms like Adam, PPO, GRU, GTr XL transformer, but does not provide specific version numbers for any libraries or dependencies (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Table 1: IPD Hyperparameters; Table 2: Coin Game Hyperparameters; Table 3: Negotiation Game Hyperparameters; Table 4: Commons Harvest Open Hyperparameters. These tables list specific values for parameters such as Learning Rate, Batch Size, Discount Factor, Entropy Beta, etc.