reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves

Authors: Martin Kurečka, Václav Nevyhoštěný, Petr Novotný, Vít Unčovský

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature. We evaluated the algorithms on three tasks: two of them are variants of the established Gridworld benchmark (Br azdil et al. 2020; Undurti and How 2010); the third is based on the Manhattan environment (Blahoudek et al. 2020), where the agent navigates through mid-town Manhattan, avoiding traffic jams captured by a stochastic model.
Researcher Affiliation	Academia	1Faculty of Informatics, Masaryk University Botanick a 68a, 612 00 Brno, Czech Republic EMAIL
Pseudocode	Yes	Algorithm 1: Threshold UCT
Open Source Code	Yes	The code is available at https://github.com/kurecka/rats/tree/ AAAI25.
Open Datasets	Yes	The task is evaluated on a set of 128 small maps (6 6 grid, 5 gold, approx. 103 states) and 64 large maps (25 25 grid, 50 gold, approx. 1017 states). All maps are sampled from our map generator, which guarantees a varying distribution of traps, walls, and gold. We provide the generator and the resulting map datasets Gridworld Small and Gridworld Large in the the project repository. ... The setup involves navigation in the eponymous Manhattan environment implemented in (Blahoudek et al. 2020).
Dataset Splits	Yes	The task is evaluated on a set of 128 small maps (6 6 grid, 5 gold, approx. 103 states) and 64 large maps (25 25 grid, 50 gold, approx. 1017 states). All maps are sampled from our map generator, which guarantees a varying distribution of traps, walls, and gold. We provide the generator and the resulting map datasets Gridworld Small and Gridworld Large in the the project repository.
Hardware Specification	No	The full description of configurations and of the hardware setup is in the extended version (Kureˇcka et al. 2024).
Software Dependencies	No	We implemented T-UCT, RAMCP, and CC-POMCP algorithms within a common C++ based MCTS framework, so as to maximize the amount of shared code among the algorithms, reducing the influence of implementation details on the experiments. The Gridworld environments are implemented as part of our C++ codebase, while the Manhattan environment is built on top of the Python implementation provided by (Blahoudek et al. 2021).
Experiment Setup	Yes	We performed 300 independent runs on each configuration. The algorithms were given a fixed time per decision summarized in Figure 2; note the time limit on Manhattan is looser so as to compensate for the slower Python implementation of the environment. The runs on Gridworld Small-based tasks had T = 100 steps, while the runs on Gridworld Large and Manhattan maps had T = 200 steps.