reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport

Authors: Tuan Quang Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D’Eramo, Odalric-Ambrym Maillard

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees of asymptotic convergence of O(n 1/2), with n as the number of visited trajectories, to the optimal policy and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.
Researcher Affiliation	Academia	1Hanoi University of Science and Technology, Hanoi, Vietnam 2Department of Computer Science, Technical University of Darmstadt, Germany 3ETHZ ETH Zurich, Switzerland 4Department of Electrical Engineering and Automation, Aalto University, Finland 5Center for Artificial Intelligence and Data Science, University of W urzburg, Germany 6Hessian Center for Artificial Intelligence (Hessian.ai), Germany 7Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189CRISt AL, F-59000 Lille, France.
Pseudocode	No	The paper describes the steps and concepts of the Wasserstein Monte-Carlo Tree Search (W-MCTS) algorithm, its backup operator, and action selection strategies using descriptive text and mathematical formulations. However, it does not include a distinct section or figure explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	No	Code for POMCP(UCT) (Silver & Veness, 2010b), D2NG (Bai et al., 2014), and DESPOT (Somani et al., 2013) is used as released by the original authors.1
Open Datasets	Yes	Frozen Lake. A 4 4 grid with slippery transitions, implemented in Open AI Gym (Brockman et al., 2016). The agent aims to reach a goal in the bottom-right corner. [...] We also test W-MCTS against POMCP(UCT), D2NG, and DESPOT in classic POMDP benchmarks: rocksample, pocman, Tag, and Laser Tag.
Dataset Splits	No	The environments used (Frozen Lake, NChain, River Swim, Six Arms, Taxi, rocksample, pocman, Tag, and Laser Tag) are reinforcement learning environments. The paper refers to 'averaged over 50 runs' and 'averaged over 1000 runs' to indicate experiment repetitions, not explicit training/validation/test dataset splits. There is no mention of static dataset partitioning.
Hardware Specification	Yes	All the experiments were done on an Intel(R) Core(TM) i9-14900K 3.20 GHz 24 cores/CPU.
Software Dependencies	No	For DNG, D2NG, we set hyperparameters as recommended in the paper and from the author s source code (Bai et al., 2013; 2014). We set exploration constant for UCT, Power-UCT to 2. We set initial standard deviation value to std = 30. The text mentions specific algorithms used as baselines, but does not provide specific version numbers for software libraries or dependencies used in their implementation.
Experiment Setup	Yes	The hyperparameters are tuned using grid-search. Except for the case of Pocman environment, we scale the rewards into the range [0, 1]. We use the discount factor γ = 0.95. For DNG, D2NG, we set hyperparameters as recommended in the paper and from the author s source code (Bai et al., 2013; 2014). We set exploration constant for UCT, Power-UCT to 2. We set initial standard deviation value to std = 30. In all Rocksample and Pocman environments, we set the heuristic for rollouts as treeknowledge = 0, rolloutknowledge = 1. For all environments, we increase the value of p and choose the best power mean p value for Power-UCT, and W-MCTS . Details can be found in Table 6.