reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Strength Estimation and Human-Like Strength Adjustment in Games

Authors: Chun Jung Chen, Chung-Chin Shih, Ti-Rong Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first conduct experiments in Go, a challenging board game with a wide range of ranks. Our strength estimator significantly achieves over 80% accuracy in predicting ranks by observing 15 games only, whereas the previous method reached 49% accuracy for 100 games. For strength adjustment, SE-MCTS successfully adjusts to designated ranks while achieving a 51.33% accuracy in aligning to human actions, outperforming a previous stateof-the-art, with only 42.56% accuracy. To demonstrate the generality of our strength system, we further apply SE and SE-MCTS to chess and obtain consistent results.
Researcher Affiliation	Academia	Chun-Jung Chen1,2 , Chung-Chin Shih1 , Ti-Rong Wu1 1Institute of Information Science, Academia Sinica, Taiwan 2Department of Computer Science, National Taiwan University, Taiwan Corresponding author: EMAIL
Pseudocode	No	The paper describes methods and processes using mathematical formulations and textual explanations within sections like '3.1 STRENGTH ESTIMATOR' and '3.3 STRENGTH ESTIMATOR BASED MCTS FOR STRENGTH ADJUSTMENT', but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://rlg.iis.sinica.edu.tw/papers/strength-estimator. ... The source code, along with a README file containing instructions is available at https://rlg.iis.sinica.edu.tw/papers/strength-estimator.
Open Datasets	Yes	The human games are collected1 from Fox Weiqi (Foxwq, 2025), which is the largest online Go platform in terms of users. ... We downloaded Go games from the Fox Weiqi online platform using its public download links. ... The games were collected from Lichess3 (Lichess, 2025), which uses Elo ratings as its ranking system. ... Lichess is one of the most popular online chess platforms, with millions of active users.
Dataset Splits	Yes	For the training dataset, we collect a total of 495,000 games, with 45,000 games from each rank. We also prepare a separate testing dataset, including a candidate and a query dataset. The candidate dataset is used to estimate an average strength score of each rank, including a total of 1,100 games, with 100 games per rank. The query dataset is used for the strength estimator to predict the strength, containing a total of 9,900 games, with 900 games per rank. ... For the testing dataset, the candidate dataset consists of 960 games, with 120 games per rank, while the query dataset contains 9,600 games, with 1,200 games per rank.
Hardware Specification	Yes	GPU: NVIDIA RTX A5000 GPU Hours 242 69 Main Memory 384GB Central Processing Unit (CPU) Intel Xeon Silver 4216 (2.1 GHz)
Software Dependencies	No	The paper mentions 'stochastic gradient descent (SGD)' as the optimizer and refers to the 'Mini Zero framework' and 'Alpha Zero network' for model architecture and features, but it does not specify version numbers for any libraries, programming languages, or software tools used for implementation.
Experiment Setup	Yes	The network architecture of the strength estimator is similar to the Alpha Zero network, consisting of 20 residual blocks with 256 channels. ... During training, we aggregate the composite strength score βi by randomly selecting m = 7 state-action pairs from ri. Other training details are provided in the appendix. ... The learning rate is initially set at 0.01 and is halved after 100,000 training steps. The entire training process encompasses 130,000 steps... Table 3: Number of Blocks 20, Input Channel 18/119, Hidden Channel 256, Learning Rate 0.01 to 0.005, Training Steps 130,000, Optimizer SGD.