Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search
Authors: Tuan Quang Dam
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on stochastic tasks validate our theoretical findings, demonstrating the effectiveness of Stochastic-Power-HOOT in continuous, stochastic domains. 7. Experiments We evaluate Stochastic-Power-HOOT on both classic control tasks and high-dimensional robotic environments, all adapted to continuous-action, stochastic settings. |
| Researcher Affiliation | Academia | 1Hanoi University of Science and Technology, Hanoi, Vietnam. Correspondence to: Tuan Dam <EMAIL>. |
| Pseudocode | Yes | Stochastic-Power-HOOT (pseudocode shown in Alg. 1) iteratively builds a search tree using four main phases: Selection, Expansion, Rollout, and Backpropagation. Algorithm 1: Stochastic-Power-HOOT with γ is a discount factor. Algorithm 2: Non-stationary Power Mean HOO Algorithm 3: Power Mean HOO Query Algorithm 4: Power Mean HOO Update |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using OpenAI Gym but this refers to a third-party tool, not the authors' own implementation code. |
| Open Datasets | Yes | For classic control tasks from Open AI Gym Brockman et al. (2016), we evaluate on Cart Pole, Cart Pole-IG (increased gravity), Pendulum, Mountain Car, and Acrobot. For high-dimensional evaluation, we test on Mu Jo Co robotics tasks including Humanoid-v0 (17-dimensional action space, 376-dimensional state space) and Hopper-v0 (3-dimensional actions), both modified with comprehensive stochastic noise. |
| Dataset Splits | No | The paper describes using standard simulation environments (OpenAI Gym, MuJoCo) and modifying them with noise. It does not provide specific training/test/validation dataset splits, as these are continuous control tasks where agents interact with an environment rather than being evaluated on a static dataset with predefined splits. |
| Hardware Specification | No | The paper mentions evaluating on 'Mu Jo Co robotics tasks' but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym Brockman et al. (2016)' but does not provide specific version numbers for OpenAI Gym or any other software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Across all tasks, we use a reward discount factor of γ = 0.99 and planning horizon of T = 150 steps. The MCTS search depth is set to D = 100 with n = 100 simulations per state. In discretized-UCT, actions are discretized into 10 uniformly sampled values. For HOOT and Stochastic-Power-HOOT, given action space dimension m, we set ρ = 1 4m and ν1 = 4m. In Stochastic-Power-HOOT, we configure HOO tree depth limit to H = 10 with parameters b = 5, β = 20, and α = 10. All algorithms use rollout policy π0 initialized as b V (s) = 0 for all states s S. |