Large-Scale Optimization for Evaluation Functions with Minimax Search

Authors: K. Hoki, T. Kaneko

JAIR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the large-scale optimization of the evaluation function improves the playing strength of shogi programs, and the new method performs significantly better than other methods. Implementation of the new method in our shogi program Bonanza made substantial contributions to the program s first-place finish in the 2013 World Computer Shogi Championship.
Researcher Affiliation Academia Kunihito Hoki EMAIL Department of Communication Engineering and Informatics The University of Electro-Communications Tomoyuki Kaneko EMAIL Department of Graphics and Computer Sciences The University of Tokyo
Pseudocode Yes Figure 2: Minimax Tree Optimization: Iteration of searches and update using partial derivatives 1. Perform a game-tree search to identify PV leaves πp.m w(t) for all child positions p.m of position p in training set P, where w(t) is the weight vector at the t-th iteration and w(0) is the initial guess. 2. Calculate a partial-derivative approximation of the well-modeled objective function defined in Section 3.1 by using both πp.m w(t) and w(t). The objective function employs a differentiable approximation of H(x) (see Section 3.1), as well as a constraint and regularization term (see Section 3.2). 3. Obtain a new weight vector w(t+1) from w(t) by using a grid-adjacent update guided by the partial derivatives computed in step 2 (see Section 3.4). Go back to step 1, or terminate the optimization when the objective function value converges (see Section 4).
Open Source Code Yes Most of the experiments described in this section used Bonanza, whose source code is available online (Hoki & Muramatsu, 2012). The source codes of various versions of Bonanza are available online (Hoki, 2013) and the source code of MMTO are in two files, learn1.c and learn2.c. One interesting case is the results of GPS Shogi (Kaneko, 2009), the winner of the 2009 and 2012 tournaments, and source codes are available online (Tanaka and Kaneko, 2013).
Open Datasets No The game records in the training and test sets were exclusively chosen from games played in famous tournaments2. We conducted preliminary experiments on chess to catch a glimpse of the applicability of MMTO to other games. ... The training and test sets were composed by using game records at the Free Internet Chess Server (FICS). These games were played using the standard time control of the server by two players with ratings of 2, 600 or more. Each training set was composed from the records of 47, 566 rapid time control (30 seconds per move) games played by amateurs on a popular Internet shogi site, Shogi Club 245.
Dataset Splits Yes The main training set P had 4, 467, 116 desired moves and ZP = 368, 313, 024 move pairs after removing duplications and handicap games from the 47, 566 game records. The test set had 103, 105 desired moves after removing duplications and handicap games from another 1, 000 game records.
Hardware Specification Yes This computation took about a week using an Intel X5690 workstation. The agreement ratio with the test set converged in 100 iterations. Bonanza using e ABCD searched about 3 106 nodes/sec on an Intel Xeon X5680 with 12 threads. Each player was allowed to use one second for each move, and fifty megabytes of memory were assigned to the transposition table.
Software Dependencies No The paper mentions several techniques and tools like PVS, quiescence search, transposition tables, static exchange evaluation, killer and history heuristics, null move pruning, futility pruning, and late move reductions. However, it does not provide specific version numbers for any of these software components, programming languages, or libraries used in their implementation for reproducibility.
Experiment Setup Yes To normalize the objective function values, the objective function values were divided by the total number of move pairs, ZP = P p P |M p|. The constraint function was set to 6, 500. Also, in accordance with the magnitude of the constraint, a in the horizontally mirrored sigmoid function T(x) = 1/(1 + exp(ax)) was set to 0.0273 so that T(x) would vary significantly if x changed by a hundred. The regularization term was JR(w) = 0.00625 (|w B| + |w C| + |w D|). An intuitive explanation of the penalty strength is that the absolute value of wi can be increased to 160 if doing so improves the relationship between the evaluation values of a desired move and another legal move. The sums in e B, e C, and e D were computed using 32-bit integers, and they were divided by 25 in order to fit the evaluation value into a 16-bit integer. Step h in the grid-adjacent update was set to the smallest integer value 1.