Strategic Knowledge Transfer
Authors: Max Olan Smith, Thomas Anthony, Michael P. Wellman
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation of these methods on general-sum grid-world games provide evidence about their advantages and limitations in comparison to standard PSRO. |
| Researcher Affiliation | Collaboration | Max Olan Smith EMAIL University of Michigan Computer Science & Engineering Ann Arbor, MI 48109-2121, USA. Thomas Anthony EMAIL Deep Mind 6 Pancras Square London N1C 4AG, UK. Michael P. Wellman EMAIL University of Michigan Computer Science & Engineering Ann Arbor, MI 48109-2121, USA. |
| Pseudocode | Yes | Algorithm 1: Value Iteration: Q-Mixing; Algorithm 2: Policy-Space Response Oracles (Lanctot et al., 2017); Algorithm 3: Mixed-Oracles; Algorithm 4: Mixed-Opponents |
| Open Source Code | No | The paper does not contain an explicit statement by the authors about releasing their source code for the methodologies described in this work (e.g., Q-Mixing, Mixed-Oracles, Mixed-Opponents). While it references a third-party open-source implementation for the Gathering environment ('https://github. com/Human Compatible AI/multi-agent'), this is not the authors' own code for their contributions. |
| Open Datasets | Yes | We first evaluate Q-Mixing on the Running With Scissors (RWS) grid-world game (Vezhnevets et al., 2020; Leibo et al., 2021). ... We compare the methods on two distinct games: RWS and Gathering (Leibo et al., 2021). ... Joel Z. Leibo, Edgar Du e nez-Guzm an, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, and Thore Graepel. Scalable evaluation of multi-agent reinforcement learning with melting pot. In 38th International Conference on Machine Learning, ICML, pages 6187 6199, 2021. |
| Dataset Splits | No | The paper describes the use of replay buffers for training and evaluating performance over '300 simulated episodes' for each opponent policy, averaged across 'five random seeds' for evaluation. However, it does not specify traditional training/validation/test dataset splits with explicit percentages or sample counts for a static dataset, which is common in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions the use of 'Deep RL' but no concrete hardware specifications. |
| Software Dependencies | No | The paper mentions several algorithms and architectures like 'Double DQN' and 'LSTM', but it does not provide specific software dependencies with version numbers (e.g., Python version, specific deep learning framework like PyTorch or TensorFlow with their versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | The LSTM has a memory size of 128, and the output is projected through a series of fully-connected layers with sizes [128, 64, 64, 9]. ... To train such a policy, an auxiliary reward of 1 is added each time the agent collects their preferred item. |