reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search

Authors: Tuan Quang Dam

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on stochastic tasks validate our theoretical findings, demonstrating the effectiveness of Stochastic-Power-HOOT in continuous, stochastic domains. 7. Experiments We evaluate Stochastic-Power-HOOT on both classic control tasks and high-dimensional robotic environments, all adapted to continuous-action, stochastic settings.
Researcher Affiliation	Academia	1Hanoi University of Science and Technology, Hanoi, Vietnam. Correspondence to: Tuan Dam <EMAIL>.
Pseudocode	Yes	Stochastic-Power-HOOT (pseudocode shown in Alg. 1) iteratively builds a search tree using four main phases: Selection, Expansion, Rollout, and Backpropagation. Algorithm 1: Stochastic-Power-HOOT with γ is a discount factor. Algorithm 2: Non-stationary Power Mean HOO Algorithm 3: Power Mean HOO Query Algorithm 4: Power Mean HOO Update
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using OpenAI Gym but this refers to a third-party tool, not the authors' own implementation code.
Open Datasets	Yes	For classic control tasks from Open AI Gym Brockman et al. (2016), we evaluate on Cart Pole, Cart Pole-IG (increased gravity), Pendulum, Mountain Car, and Acrobot. For high-dimensional evaluation, we test on Mu Jo Co robotics tasks including Humanoid-v0 (17-dimensional action space, 376-dimensional state space) and Hopper-v0 (3-dimensional actions), both modified with comprehensive stochastic noise.
Dataset Splits	No	The paper describes using standard simulation environments (OpenAI Gym, MuJoCo) and modifying them with noise. It does not provide specific training/test/validation dataset splits, as these are continuous control tasks where agents interact with an environment rather than being evaluated on a static dataset with predefined splits.
Hardware Specification	No	The paper mentions evaluating on 'Mu Jo Co robotics tasks' but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments.
Software Dependencies	No	The paper mentions 'Open AI Gym Brockman et al. (2016)' but does not provide specific version numbers for OpenAI Gym or any other software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Across all tasks, we use a reward discount factor of γ = 0.99 and planning horizon of T = 150 steps. The MCTS search depth is set to D = 100 with n = 100 simulations per state. In discretized-UCT, actions are discretized into 10 uniformly sampled values. For HOOT and Stochastic-Power-HOOT, given action space dimension m, we set ρ = 1 4m and ν1 = 4m. In Stochastic-Power-HOOT, we configure HOO tree depth limit to H = 10 with parameters b = 5, β = 20, and α = 10. All algorithms use rollout policy π0 initialized as b V (s) = 0 for all states s S.