Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves

Authors: Martin Kurečka, Václav Nevyhoštěný, Petr Novotný, Vít Unčovský

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature. We evaluated the algorithms on three tasks: two of them are variants of the established Gridworld benchmark (Br azdil et al. 2020; Undurti and How 2010); the third is based on the Manhattan environment (Blahoudek et al. 2020), where the agent navigates through mid-town Manhattan, avoiding traffic jams captured by a stochastic model.
Researcher Affiliation Academia 1Faculty of Informatics, Masaryk University Botanick a 68a, 612 00 Brno, Czech Republic EMAIL
Pseudocode Yes Algorithm 1: Threshold UCT
Open Source Code Yes The code is available at https://github.com/kurecka/rats/tree/ AAAI25.
Open Datasets Yes The task is evaluated on a set of 128 small maps (6 6 grid, 5 gold, approx. 103 states) and 64 large maps (25 25 grid, 50 gold, approx. 1017 states). All maps are sampled from our map generator, which guarantees a varying distribution of traps, walls, and gold. We provide the generator and the resulting map datasets Gridworld Small and Gridworld Large in the the project repository. ... The setup involves navigation in the eponymous Manhattan environment implemented in (Blahoudek et al. 2020).
Dataset Splits Yes The task is evaluated on a set of 128 small maps (6 6 grid, 5 gold, approx. 103 states) and 64 large maps (25 25 grid, 50 gold, approx. 1017 states). All maps are sampled from our map generator, which guarantees a varying distribution of traps, walls, and gold. We provide the generator and the resulting map datasets Gridworld Small and Gridworld Large in the the project repository.
Hardware Specification No The full description of configurations and of the hardware setup is in the extended version (Kureˇcka et al. 2024).
Software Dependencies No We implemented T-UCT, RAMCP, and CC-POMCP algorithms within a common C++ based MCTS framework, so as to maximize the amount of shared code among the algorithms, reducing the influence of implementation details on the experiments. The Gridworld environments are implemented as part of our C++ codebase, while the Manhattan environment is built on top of the Python implementation provided by (Blahoudek et al. 2021).
Experiment Setup Yes We performed 300 independent runs on each configuration. The algorithms were given a fixed time per decision summarized in Figure 2; note the time limit on Manhattan is looser so as to compensate for the slower Python implementation of the environment. The runs on Gridworld Small-based tasks had T = 100 steps, while the runs on Gridworld Large and Manhattan maps had T = 200 steps.