Sketch Decompositions for Classical Planning via Deep Reinforcement Learning

Authors: Michael Aichmüller, Hector Geffner

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The sketch decompositions obtained through this method are experimentally evaluated across various domains, and problems are regarded as solved by the decomposition when the goal is reached through a greedy sequence of IW(k) searches. The experiments aim to address several key questions.
Researcher Affiliation Academia Michael Aichm uller , Hector Geffner RWTH Aachen University, Germany EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Actor-Critic RL for generalized planning
Open Source Code Yes Appendix available on ar Xiv (arxiv.org/abs/2412.08574), code and data available on Zenodo (zenodo.org/records/15614893).
Open Datasets Yes Appendix available on ar Xiv (arxiv.org/abs/2412.08574), code and data available on Zenodo (zenodo.org/records/15614893). The domains and training data are primarily from previous works on learning sketches and general policies [Drexler et al., 2022; St ahlberg et al., 2023]. This includes Blocks with single and multiple target towers, Childsnack, Delivery, Grid, Gripper, Logistics, Miconic, Reward, Spanner, and Visitall.
Dataset Splits No The domains and training data are primarily from previous works on learning sketches and general policies [Drexler et al., 2022; St ahlberg et al., 2023]. This includes Blocks with single and multiple target towers, Childsnack, Delivery, Grid, Gripper, Logistics, Miconic, Reward, Spanner, and Visitall. Each domain is tested on 40 larger instances, which extend those used in prior studies (details in the appendix). The paper mentions "training data", "validation set", and "test instances", but does not provide specific split percentages or methodology for reproducing these splits.
Hardware Specification Yes The Actor Critic algorithm uses a discount factor γ = 0.999, a learning rate α = 2 × 10−4, the Adam optimizer [Kingma and Ba, 2015], and runs on a single NVIDIA A10 GPU for up to 48 hours per domain. In the table, LM indicates the plan length computed by the classical planner LAMA, run on an Intel Xeon Platinum 8352M CPU with a 10-minute time and 100 GB memory limit.
Software Dependencies No We use the DRL implementation from [St ahlberg et al., 2023] with the same hyperparameters to learn the policy π that defines the decomposition Gπ k. The paper mentions using a DRL implementation and the Adam optimizer, but does not specify software names with version numbers for reproducibility. It also mentions "classical planner LAMA" without a version.
Experiment Setup Yes The GNN has feature vectors of size 64 and 30 layers. The Actor Critic algorithm uses a discount factor γ = 0.999, a learning rate α = 2 × 10−4, the Adam optimizer [Kingma and Ba, 2015], and runs on a single NVIDIA A10 GPU for up to 48 hours per domain. Five models are trained independently with different seeds, and the model with the best validation score is selected for testing. The validation score is determined by the ratio LV /L V , where LV is the plan length from SIWπ(k) and L V is the optimal plan length, both averaged over all states of a validation set. Training is stopped early if this ratio approaches 1.0.