Implicit Search via Discrete Diffusion: A Study on Chess
Authors: Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive controlled experiments, we show DIFFUSEARCH outperforms both the searchless and explicit search-enhanced policies. Specifically, DIFFUSEARCH outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DIFFUSEARCH demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment. |
| Researcher Affiliation | Collaboration | 1 The University of Hong Kong 2 Shanghai Jiaotong University 3 Huawei Noah s Ark Lab 4 Shanghai AI Laboratory |
| Pseudocode | Yes | Algorithm 1 DIFFUSEARCH Training Input: dataset D = {(s, (a, z))}, neural network f ( ; θ), timesteps T. Output: model parameters θ. Denote state length l = |s|; repeat Draw (s, (a, z)) D and obtain x0,1:N = s || a || z (||: concat); Draw t Uniform({1, . . . , T}); Draw xt,n q(xt,n|x0,n) for n {l+1, . . . , N}; L(θ) = λt PN n=l+1 1xt,n =x0,nx 0,n log f(xt,n; θ); Minimize L(θ) with respect to θ; until converged Algorithm 2 DIFFUSEARCH Inference Input: board state s, trained network f ( ; θ), timesteps T. Output: next action a. Denote state length l = |s|; Initialize x T,1:l = s and x T,l+1:N qnoise; for t = T, . . . , 1 do for n = l + 1, . . . , N do Draw ex0,n Cat (f(xt,n; θ)) ; Draw xt 1,n q(xt 1,n | xt,n, ex0,n); end for end for Return a = x0,l+1. |
| Open Source Code | Yes | All codes are publicly available at https://github.com/HKUNLP/Diffu Search. |
| Open Datasets | Yes | We construct a dataset for supervised training by downloading games from lichess recorded in February 2023. We utilize Stockfish 16, currently the world s strongest search-based engine, as an oracle to label board states extracted from randomly selected games on lichess.org. |
| Dataset Splits | Yes | Table 1: Data statistics. Stage Records Games Train SA-V (100k) 193,189,573 100,000 Train SA-V (10k) 17,448,268 10,000 Train others (100k) 6,564,661 100,000 Train others (10k) 659,576 10,000 Action Test 62,561 1,000 Puzzle Test 36,816 10,000 |
| Hardware Specification | Yes | All experiments are done on 8 NVIDIA V100 32G GPUs. |
| Software Dependencies | No | No specific software versions for dependencies like PyTorch, TensorFlow, or Python are mentioned. Although GPT-2 transformer architecture and Adam optimizer are mentioned, their versions are not specified. Stockfish 16 is mentioned as an oracle, not a dependency of their code. |
| Experiment Setup | Yes | We train all baseline models until convergence and set a maximum of 200 epochs for diffusion models due to their slow convergence. We use the Adam optimizer (Kingma & Ba, 2015), a learning rate of 3e-4, and a batch size of 1024 for all models. By default, we set the horizon h to be 4, the number of network layers to be 8 (with a total parameter size of 7M), the diffusion timesteps to be 20, and an absorbing noise type. By default, 100 simulations are utilized in MCTS-enhanced policy, and its impact is analyzed in Figure 3. We adjust cpuct and τ, constants determining the level of exploration in MCTS, on a held-out set and set them to cpuct = 0.1 and τ = 1 for its superior performance. |