Online Robust Reinforcement Learning Through Monte-Carlo Planning

Authors: Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics. Our contributions are threefold: Robust Empirical Performance: We conduct experiments in two environments (Gambler s Problem and Frozen Lake) to evaluate our robust algorithm, demonstrating that it achieves superior robust performance to model mismatches than the standard MCTS algorithm baseline.
Researcher Affiliation Academia 1Hanoi University of Science and Technology, Hanoi, Vietnam 2Department of Computing and Mathematical Sciences, California Insitute of Technology Pasadena, CA, USA 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189-CRISt AL.
Pseudocode Yes Algorithm 1: Robust-Power-UCT with γ discount factor.
Open Source Code Yes We also provide our code at https://github.com/brahimdriss/Robust MCTS.
Open Datasets Yes The Gambler s Problem (Sutton and Barto, 2018): a classic casino-inspired reinforcement learning environment... Frozen Lake(Towers et al., 2024): This environment presents a gridworld navigation challenge...
Dataset Splits No The paper uses simulated environments (Gambler's Problem, Frozen Lake, American Option Pricing) rather than traditional datasets with explicit train/test/validation splits. It describes evaluation scenarios under different planning and execution probabilities and uses '100 seeds' for experiments, but does not specify dataset splits in the conventional sense.
Hardware Specification No This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011015599 made by GENCI. This mentions a High-Performance Computing resource but lacks specific hardware details such as CPU/GPU models or memory.
Software Dependencies No We implement our robust MCTS framework by extending a base Monte Carlo Tree Search implementation from (Leurent, 2018). While a software implementation is mentioned, no specific version number for 'rl-agents' or any other libraries is provided.
Experiment Setup Yes All experiments are done over 100 seeds, using γ = 0.99 and robustness budget ρ = 0.5, with these values showing consistent performance across preliminary experiments with different parameter settings. We use 2000 rollouts for The Gambler s Problem and 4000 rollouts for Frozen Lake.