reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large-Scale Optimization for Evaluation Functions with Minimax Search

Authors: K. Hoki, T. Kaneko

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the large-scale optimization of the evaluation function improves the playing strength of shogi programs, and the new method performs signiﬁcantly better than other methods. Implementation of the new method in our shogi program Bonanza made substantial contributions to the program s ﬁrst-place ﬁnish in the 2013 World Computer Shogi Championship.
Researcher Affiliation	Academia	Kunihito Hoki EMAIL Department of Communication Engineering and Informatics The University of Electro-Communications Tomoyuki Kaneko EMAIL Department of Graphics and Computer Sciences The University of Tokyo
Pseudocode	Yes	Figure 2: Minimax Tree Optimization: Iteration of searches and update using partial derivatives 1. Perform a game-tree search to identify PV leaves πp.m w(t) for all child positions p.m of position p in training set P, where w(t) is the weight vector at the t-th iteration and w(0) is the initial guess. 2. Calculate a partial-derivative approximation of the well-modeled objective function deﬁned in Section 3.1 by using both πp.m w(t) and w(t). The objective function employs a diﬀerentiable approximation of H(x) (see Section 3.1), as well as a constraint and regularization term (see Section 3.2). 3. Obtain a new weight vector w(t+1) from w(t) by using a grid-adjacent update guided by the partial derivatives computed in step 2 (see Section 3.4). Go back to step 1, or terminate the optimization when the objective function value converges (see Section 4).
Open Source Code	Yes	Most of the experiments described in this section used Bonanza, whose source code is available online (Hoki & Muramatsu, 2012). The source codes of various versions of Bonanza are available online (Hoki, 2013) and the source code of MMTO are in two ﬁles, learn1.c and learn2.c. One interesting case is the results of GPS Shogi (Kaneko, 2009), the winner of the 2009 and 2012 tournaments, and source codes are available online (Tanaka and Kaneko, 2013).
Open Datasets	No	The game records in the training and test sets were exclusively chosen from games played in famous tournaments2. We conducted preliminary experiments on chess to catch a glimpse of the applicability of MMTO to other games. ... The training and test sets were composed by using game records at the Free Internet Chess Server (FICS). These games were played using the standard time control of the server by two players with ratings of 2, 600 or more. Each training set was composed from the records of 47, 566 rapid time control (30 seconds per move) games played by amateurs on a popular Internet shogi site, Shogi Club 245.
Dataset Splits	Yes	The main training set P had 4, 467, 116 desired moves and ZP = 368, 313, 024 move pairs after removing duplications and handicap games from the 47, 566 game records. The test set had 103, 105 desired moves after removing duplications and handicap games from another 1, 000 game records.
Hardware Specification	Yes	This computation took about a week using an Intel X5690 workstation. The agreement ratio with the test set converged in 100 iterations. Bonanza using e ABCD searched about 3 106 nodes/sec on an Intel Xeon X5680 with 12 threads. Each player was allowed to use one second for each move, and ﬁfty megabytes of memory were assigned to the transposition table.
Software Dependencies	No	The paper mentions several techniques and tools like PVS, quiescence search, transposition tables, static exchange evaluation, killer and history heuristics, null move pruning, futility pruning, and late move reductions. However, it does not provide specific version numbers for any of these software components, programming languages, or libraries used in their implementation for reproducibility.
Experiment Setup	Yes	To normalize the objective function values, the objective function values were divided by the total number of move pairs, ZP = P p P \|M p\|. The constraint function was set to 6, 500. Also, in accordance with the magnitude of the constraint, a in the horizontally mirrored sigmoid function T(x) = 1/(1 + exp(ax)) was set to 0.0273 so that T(x) would vary signiﬁcantly if x changed by a hundred. The regularization term was JR(w) = 0.00625 (\|w B\| + \|w C\| + \|w D\|). An intuitive explanation of the penalty strength is that the absolute value of wi can be increased to 160 if doing so improves the relationship between the evaluation values of a desired move and another legal move. The sums in e B, e C, and e D were computed using 32-bit integers, and they were divided by 25 in order to ﬁt the evaluation value into a 16-bit integer. Step h in the grid-adjacent update was set to the smallest integer value 1.