Combinatorial Multi-armed Bandits for Real-Time Strategy Games
Authors: Santiago Ontañón
JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the theoretical properties of several variants of na ıve sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, na ıve sampling outperforms the other sampling strategies. |
| Researcher Affiliation | Academia | Santiago Onta n on EMAIL Computer Science Department, Drexel University Philadelphia, PA, USA |
| Pseudocode | Yes | Algorithm 1 Select And Expand Node(n0) |
| Open Source Code | Yes | A snapshot of all the source code and data necessary to reproduce all the experiments presented in this paper can be downloaded from the author s website2. 2. https://sites.google.com/site/santiagoontanonvillar/code/Naive Sampling-journal-2016source-code.zip |
| Open Datasets | Yes | A snapshot of all the source code and data necessary to reproduce all the experiments presented in this paper can be downloaded from the author s website2. The µRTS game simulator1 as our application domain, which is a deterministic and fully-observable RTS game (although it can be configured for partial observability or non-determinism). 1. https://github.com/santiontanon/microrts |
| Dataset Splits | No | The paper describes experimental setups involving different game maps and round-robin tournaments, but does not use traditional train/test/validation dataset splits in the machine learning sense. The experiments are conducted within a simulator, and specific game situations are used, but these are not referred to as dataset splits. |
| Hardware Specification | Yes | In the Java µRTS implementation of these algorithms, running MC search takes about 271ms to run 1000 playouts in 8x8 maps, and about 615ms in 12x12 maps on an Intel Core i7 3.1GHz (the sampling strategy makes little difference, since the main bottleneck is running the forward model to run the playouts). |
| Software Dependencies | No | The paper mentions 'Java µRTS implementation' and provides a link to the µRTS simulator (https://github.com/santiontanon/microrts), but does not specify version numbers for Java, µRTS, or any other libraries or frameworks. |
| Experiment Setup | Yes | We used ϵ0 = 0.8 (emphasizing exploration), ϵl = 0.4, and ϵg = 0 (emphasizing the fact that the global MAB is used for exploitation), since they achieved the best results in our experiments. We performed a series of round-robin tournaments involving six different AIs... In each round-robin tournament, each AI played 20 games against each other AI (10 times as player max, and 10 times as player min) in each of the 8 maps, resulting in a total of 2400 games per tournament. We played six such round-robin tournaments, using a computation budget of 500, 1000, 2000, 3000, 4000 and 5000 playouts per game cycle respectively. As the reward function, we use the result of running a Monte Carlo simulation of the game during 100 game cycles (using a random action selection strategy), and then using an evaluation function to the resulting game state. As the evaluation function, we used one of the built-in function in µRTS (Simple Sqrt Evaluation Function3). |