Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search

Authors: Tuan Quang Dam

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on stochastic tasks validate our theoretical findings, demonstrating the effectiveness of Stochastic-Power-HOOT in continuous, stochastic domains. 7. Experiments We evaluate Stochastic-Power-HOOT on both classic control tasks and high-dimensional robotic environments, all adapted to continuous-action, stochastic settings.
Researcher Affiliation Academia 1Hanoi University of Science and Technology, Hanoi, Vietnam. Correspondence to: Tuan Dam <EMAIL>.
Pseudocode Yes Stochastic-Power-HOOT (pseudocode shown in Alg. 1) iteratively builds a search tree using four main phases: Selection, Expansion, Rollout, and Backpropagation. Algorithm 1: Stochastic-Power-HOOT with γ is a discount factor. Algorithm 2: Non-stationary Power Mean HOO Algorithm 3: Power Mean HOO Query Algorithm 4: Power Mean HOO Update
Open Source Code No The paper does not contain an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using OpenAI Gym but this refers to a third-party tool, not the authors' own implementation code.
Open Datasets Yes For classic control tasks from Open AI Gym Brockman et al. (2016), we evaluate on Cart Pole, Cart Pole-IG (increased gravity), Pendulum, Mountain Car, and Acrobot. For high-dimensional evaluation, we test on Mu Jo Co robotics tasks including Humanoid-v0 (17-dimensional action space, 376-dimensional state space) and Hopper-v0 (3-dimensional actions), both modified with comprehensive stochastic noise.
Dataset Splits No The paper describes using standard simulation environments (OpenAI Gym, MuJoCo) and modifying them with noise. It does not provide specific training/test/validation dataset splits, as these are continuous control tasks where agents interact with an environment rather than being evaluated on a static dataset with predefined splits.
Hardware Specification No The paper mentions evaluating on 'Mu Jo Co robotics tasks' but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments.
Software Dependencies No The paper mentions 'Open AI Gym Brockman et al. (2016)' but does not provide specific version numbers for OpenAI Gym or any other software libraries, frameworks, or programming languages used.
Experiment Setup Yes Across all tasks, we use a reward discount factor of γ = 0.99 and planning horizon of T = 150 steps. The MCTS search depth is set to D = 100 with n = 100 simulations per state. In discretized-UCT, actions are discretized into 10 uniformly sampled values. For HOOT and Stochastic-Power-HOOT, given action space dimension m, we set ρ = 1 4m and ν1 = 4m. In Stochastic-Power-HOOT, we configure HOO tree depth limit to H = 10 with parameters b = 5, β = 20, and α = 10. All algorithms use rollout policy π0 initialized as b V (s) = 0 for all states s S.