reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Strategic Knowledge Transfer

Authors: Max Olan Smith, Thomas Anthony, Michael P. Wellman

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluation of these methods on general-sum grid-world games provide evidence about their advantages and limitations in comparison to standard PSRO.
Researcher Affiliation	Collaboration	Max Olan Smith EMAIL University of Michigan Computer Science & Engineering Ann Arbor, MI 48109-2121, USA. Thomas Anthony EMAIL Deep Mind 6 Pancras Square London N1C 4AG, UK. Michael P. Wellman EMAIL University of Michigan Computer Science & Engineering Ann Arbor, MI 48109-2121, USA.
Pseudocode	Yes	Algorithm 1: Value Iteration: Q-Mixing; Algorithm 2: Policy-Space Response Oracles (Lanctot et al., 2017); Algorithm 3: Mixed-Oracles; Algorithm 4: Mixed-Opponents
Open Source Code	No	The paper does not contain an explicit statement by the authors about releasing their source code for the methodologies described in this work (e.g., Q-Mixing, Mixed-Oracles, Mixed-Opponents). While it references a third-party open-source implementation for the Gathering environment ('https://github. com/Human Compatible AI/multi-agent'), this is not the authors' own code for their contributions.
Open Datasets	Yes	We first evaluate Q-Mixing on the Running With Scissors (RWS) grid-world game (Vezhnevets et al., 2020; Leibo et al., 2021). ... We compare the methods on two distinct games: RWS and Gathering (Leibo et al., 2021). ... Joel Z. Leibo, Edgar Du e nez-Guzm an, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, and Thore Graepel. Scalable evaluation of multi-agent reinforcement learning with melting pot. In 38th International Conference on Machine Learning, ICML, pages 6187 6199, 2021.
Dataset Splits	No	The paper describes the use of replay buffers for training and evaluating performance over '300 simulated episodes' for each opponent policy, averaged across 'five random seeds' for evaluation. However, it does not specify traditional training/validation/test dataset splits with explicit percentages or sample counts for a static dataset, which is common in supervised learning contexts.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions the use of 'Deep RL' but no concrete hardware specifications.
Software Dependencies	No	The paper mentions several algorithms and architectures like 'Double DQN' and 'LSTM', but it does not provide specific software dependencies with version numbers (e.g., Python version, specific deep learning framework like PyTorch or TensorFlow with their versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	The LSTM has a memory size of 128, and the output is projected through a series of fully-connected layers with sizes [128, 64, 64, 9]. ... To train such a policy, an auxiliary reward of 1 is added each time the agent collects their preferred item.