Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games
Authors: Chiu-Chou Lin, Wei-Chen Chiu, I-Chen Wu
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across two racing games and seven Atari games, our techniques significantly improve the precision of zero-shot playstyle classification, achieving an accuracy exceeding 90% with fewer than 512 observation-action pairs less than half an episode of these games. Furthermore, our experiments with 2048 and Go demonstrate the potential of discrete playstyle measures in puzzle and board games. |
| Researcher Affiliation | Academia | Chiu-Chou Lin EMAIL Department of Computer Science National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan Wei-Chen Chiu EMAIL Department of Computer Science National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan I-Chen Wu EMAIL Department of Computer Science National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan Research Center for Information Technology Innovation Academia Sinica, Taipei 11529, Taiwan |
| Pseudocode | Yes | Algorithm 1 Measuring Policy Diversity Input: Policy π, Environment E, Similarity measure M Input: Similarity threshold t, Number of trajectories N 1: Initialize S (store trajectories) and diverse trajectory count d = 0 2: for i = 1 to N do 3: Generate a trajectory τi π, E 4: Set is_diverse = true 5: for each τj in S do 6: Compute similarity M(τi, τj) 7: if M(τi, τj) t then 8: is_diverse = false 9: break 10: end if 11: end for 12: if is_diverse then 13: d = d + 1 14: end if 15: Store τi in S 16: end for Output: Return d (diverse trajectory count) and N (total trajectories) |
| Open Source Code | No | It is crucial to clarify that our research did not involve the training of new encoder models. Instead, we leveraged three pretrained encoder models and corresponding datasets for each game, provided by Lin et al. (2021). The associated resources are available in their official release.1 The game details are listed in Table 1. 1https://paperswithcode.com/paper/an-unsupervised-video-game-playstyle-metric |
| Open Datasets | Yes | Our study encompasses three distinct game platforms, as depicted in Figures 3a, 3b, and 3c: 1. TORCS: This racing game features stable, controlled rule-based AI players (Yoshida et al., 2017). 2. RGSK Racing Game Starter Kit: This racing game, available on the Unity Asset Store (Juliani et al., 2020)... 3. Atari games with DRL agents: The dataset spans 7 different Atari games (Bellemare et al., 2013) from this platform. Each game includes 20 AI models, all of which demonstrate varied playstyles. These AI models originate from the DRL framework, Dopamine (Castro et al., 2018). ... The Go dataset used in this study was sourced from Fox Go (Fox Go, 2024a;b) and provided by the team of the Mini Zero framework (Wu et al., 2024). |
| Dataset Splits | Yes | Our playstyle classification adheres to the zero-shot methodology. As depicted in Figure 3d, we start with a query dataset N, sampled from a playstyle Stylen. We then compare this to multiple reference datasets M, each sampled from different playstyles Style. We perform 100 rounds of random subsampling for each playstyle; our primary performance metric for this task is the accuracy of playstyle classification. ... For each player, we collected 1000 episodes, using the first 500 as the reference dataset and the remaining 500 as separate query datasets. This resulted in a total of 5000 query datasets for the experiment. ... Another dataset includes 200 human players with Go skill ranging from 1 Dan to 9 Dan, each contributing 100 games to the query datasets and 100 games to the candidate datasets. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models, memory, or cloud instances) are mentioned for running the experiments. The paper discusses training DRL agents and encoder models but does not specify the hardware used for these processes or for the evaluations. |
| Software Dependencies | No | The paper mentions using 'training code available on Git Hub' for 2048 agents and the 'Mini Zero framework' for the Go dataset, as well as the 'Adam optimizer'. However, no specific version numbers for these or any other software dependencies (like Python, PyTorch, TensorFlow, CUDA) are provided, which are necessary for reproducibility. |
| Experiment Setup | Yes | We set the learning rate (α) to 0.01 and maintained all other default settings. ... We train the encoder with a batch size of 1024 over 100 iterations, each iteration including 1000 network updates with the Adam optimizer. The learning rate starts at 0.00025 and linearly decays to 0 according to the iteration number. The coefficient β in the vector quantization process is the commonly suggested 0.25 (van den Oord et al., 2017; Lin et al., 2021). The loss function for the policy head is cross-entropy, and the loss for the value head is mean square error, with the loss coefficients of these two heads both set to 1. |