Online Robust Reinforcement Learning Through Monte-Carlo Planning
Authors: Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics. Our contributions are threefold: Robust Empirical Performance: We conduct experiments in two environments (Gambler s Problem and Frozen Lake) to evaluate our robust algorithm, demonstrating that it achieves superior robust performance to model mismatches than the standard MCTS algorithm baseline. |
| Researcher Affiliation | Academia | 1Hanoi University of Science and Technology, Hanoi, Vietnam 2Department of Computing and Mathematical Sciences, California Insitute of Technology Pasadena, CA, USA 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189-CRISt AL. |
| Pseudocode | Yes | Algorithm 1: Robust-Power-UCT with γ discount factor. |
| Open Source Code | Yes | We also provide our code at https://github.com/brahimdriss/Robust MCTS. |
| Open Datasets | Yes | The Gambler s Problem (Sutton and Barto, 2018): a classic casino-inspired reinforcement learning environment... Frozen Lake(Towers et al., 2024): This environment presents a gridworld navigation challenge... |
| Dataset Splits | No | The paper uses simulated environments (Gambler's Problem, Frozen Lake, American Option Pricing) rather than traditional datasets with explicit train/test/validation splits. It describes evaluation scenarios under different planning and execution probabilities and uses '100 seeds' for experiments, but does not specify dataset splits in the conventional sense. |
| Hardware Specification | No | This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011015599 made by GENCI. This mentions a High-Performance Computing resource but lacks specific hardware details such as CPU/GPU models or memory. |
| Software Dependencies | No | We implement our robust MCTS framework by extending a base Monte Carlo Tree Search implementation from (Leurent, 2018). While a software implementation is mentioned, no specific version number for 'rl-agents' or any other libraries is provided. |
| Experiment Setup | Yes | All experiments are done over 100 seeds, using γ = 0.99 and robustness budget ρ = 0.5, with these values showing consistent performance across preliminary experiments with different parameter settings. We use 2000 rollouts for The Gambler s Problem and 4000 rollouts for Frozen Lake. |