Online Planning in MDPs with Stochastic Durative Actions
Authors: Tal Berman, Ronen I. Brafman, Erez Karpas
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Overall, we present the first online planner for stochastic temporal planning, solving a richer problem representation than previous work while achieving state-of-the-art empirical results. (D) A comprehensive empirical evaluation on a problem set larger than prior work, including existing and novel domains, demonstrating improved scalability and scope. Code, domains, additional results and discussion can be found at https://github.com/tali Berman5/TP MCTS. We conducted experiments with two possible decision-time budgets: 1 and 10 seconds and domain instances of different sizes. All algorithms were implemented in Python. Experiments were run on an an AMD EPYC 7702P 64-Core Processor. Each experiment was repeated 100 times. Tables 1 and 2 show the success rate of each planner on 100 trials for 1 and 10 seconds per decision. |
| Researcher Affiliation | Academia | 1Ben-Gurion University 2Technion EMAIL,EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 TP-MCTS (root-interval version) |
| Open Source Code | Yes | Code, domains, additional results and discussion can be found at https://github.com/tali Berman5/TP MCTS. |
| Open Datasets | Yes | Our comparison covers more domains than previous work. We consider five structured domains that model realworld problems: NASA Rover and Machine Shop, as used by [Mausam and Weld, 2008], Stuck-Car, a new domain with more interesting stochastic effects, and two new domains with required concurrency: Hosting and Probabilistic Match Cellar. We also use three synthetic domains to study basic properties of the algorithms. We conducted experiments with two possible decision-time budgets: 1 and 10 seconds and domain instances of different sizes. All algorithms were implemented in Python. Experiments were run on an AMD EPYC 7702P 64-Core Processor. Each experiment was repeated 100 times. Our GIT contains additional results. The following general trends emerge: (1) MW-MCTS almost always dominates MW-RTDP. RTDP is better only in the smaller Simple domain, where, due to action noninteraction, it can update the value function quickly. (2) TPMCTS dominates in almost all domains. There are two exceptions: In Nasa Rover(1), MW benefits from its shorter solution depth given 1sec. This advantage disappears in the larger Nasa Rover(2). MW-RTDP has a higher success rate in Prob-Conc+8 given 10sec. Given the performance on other Prob-Conc variants, this may be due to the variance in the success estimates. The impact of domain size on the algorithms is clearly visible in the three structured domains. (3) Additional search time almost always leads to increased success rates for all algorithms, but trends (1)-(2) hold in both tables. (4) Hosting-2 and Prob MC(i) are not solvable by MW s method due to the need for required concurrency. Simple-x highlight the difficulty MW s algorithm has scaling up as the number of non-mutex actions increases due to the exponential growth in the number of legal action combinations. Although the solution depth is smaller, the algorithm does not have sufficient time to explore all actions and has a low chance of detecting the solution. With more time, it slightly scales up. TP-MCTS requires deeper solutions, but MCTS combined with PTRPG can focus exploration on more promising paths. Similar behavior is observed in Nasa Rover(2), Machine Shop(2), and Machine Shop(3). Comparing earliest and root-interval, earliest performs better given less time when temporal constraints are not complex because it can expand more nodes. Thus, trying to start an action as soon as possible is often a good heuristic. However, with sufficient time, root-interval succeeds more often, and in domains with more complex temporal dependencies, such as Hosting, only it can solve the problem reliably. Additional experiments (see GIT) show that: (1) When deadlines are relaxed, success rates increase, but makespan is rarely impacted. (2) PTRPG s evaluation formula can have a significant impact on success rates, Nevertheless, but the relative strength of all variants remains similar, with TP-MCTS remaining much stronger than the MW variants. |
| Dataset Splits | No | The paper mentions experiments conducted on |
| Hardware Specification | Yes | Experiments were run on an AMD EPYC 7702P 64-Core Processor. |
| Software Dependencies | No | The paper states, "All algorithms were implemented in Python." and "Our TP-MCTS implementation supports domain representations written in the Unified Planning Framework [Kapellos et al., 2023]." However, no specific version numbers are provided for Python or the Unified Planning Framework. |
| Experiment Setup | Yes | We conducted experiments with two possible decision-time budgets: 1 and 10 seconds and domain instances of different sizes. All algorithms were implemented in Python. Experiments were run on an AMD EPYC 7702P 64-Core Processor. Each experiment was repeated 100 times. Deadlines were set to the length of an optimal plan + time for one action failure, so as to be challenging. |