reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Implicit Search via Discrete Diffusion: A Study on Chess

Authors: Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive controlled experiments, we show DIFFUSEARCH outperforms both the searchless and explicit search-enhanced policies. Specifically, DIFFUSEARCH outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DIFFUSEARCH demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment.
Researcher Affiliation	Collaboration	1 The University of Hong Kong 2 Shanghai Jiaotong University 3 Huawei Noah s Ark Lab 4 Shanghai AI Laboratory
Pseudocode	Yes	Algorithm 1 DIFFUSEARCH Training Input: dataset D = {(s, (a, z))}, neural network f ( ; θ), timesteps T. Output: model parameters θ. Denote state length l = \|s\|; repeat Draw (s, (a, z)) D and obtain x0,1:N = s \|\| a \|\| z (\|\|: concat); Draw t Uniform({1, . . . , T}); Draw xt,n q(xt,n\|x0,n) for n {l+1, . . . , N}; L(θ) = λt PN n=l+1 1xt,n =x0,nx 0,n log f(xt,n; θ); Minimize L(θ) with respect to θ; until converged Algorithm 2 DIFFUSEARCH Inference Input: board state s, trained network f ( ; θ), timesteps T. Output: next action a. Denote state length l = \|s\|; Initialize x T,1:l = s and x T,l+1:N qnoise; for t = T, . . . , 1 do for n = l + 1, . . . , N do Draw ex0,n Cat (f(xt,n; θ)) ; Draw xt 1,n q(xt 1,n \| xt,n, ex0,n); end for end for Return a = x0,l+1.
Open Source Code	Yes	All codes are publicly available at https://github.com/HKUNLP/Diffu Search.
Open Datasets	Yes	We construct a dataset for supervised training by downloading games from lichess recorded in February 2023. We utilize Stockfish 16, currently the world s strongest search-based engine, as an oracle to label board states extracted from randomly selected games on lichess.org.
Dataset Splits	Yes	Table 1: Data statistics. Stage Records Games Train SA-V (100k) 193,189,573 100,000 Train SA-V (10k) 17,448,268 10,000 Train others (100k) 6,564,661 100,000 Train others (10k) 659,576 10,000 Action Test 62,561 1,000 Puzzle Test 36,816 10,000
Hardware Specification	Yes	All experiments are done on 8 NVIDIA V100 32G GPUs.
Software Dependencies	No	No specific software versions for dependencies like PyTorch, TensorFlow, or Python are mentioned. Although GPT-2 transformer architecture and Adam optimizer are mentioned, their versions are not specified. Stockfish 16 is mentioned as an oracle, not a dependency of their code.
Experiment Setup	Yes	We train all baseline models until convergence and set a maximum of 200 epochs for diffusion models due to their slow convergence. We use the Adam optimizer (Kingma & Ba, 2015), a learning rate of 3e-4, and a batch size of 1024 for all models. By default, we set the horizon h to be 4, the number of network layers to be 8 (with a total parameter size of 7M), the diffusion timesteps to be 20, and an absorbing noise type. By default, 100 simulations are utilized in MCTS-enhanced policy, and its impact is analyzed in Figure 3. We adjust cpuct and τ, constants determining the level of exploration in MCTS, on a held-out set and set them to cpuct = 0.1 and τ = 1 for its superior performance.