reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sketch Decompositions for Classical Planning via Deep Reinforcement Learning

Authors: Michael Aichmüller, Hector Geffner

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The sketch decompositions obtained through this method are experimentally evaluated across various domains, and problems are regarded as solved by the decomposition when the goal is reached through a greedy sequence of IW(k) searches. The experiments aim to address several key questions.
Researcher Affiliation	Academia	Michael Aichm uller , Hector Geffner RWTH Aachen University, Germany EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Actor-Critic RL for generalized planning
Open Source Code	Yes	Appendix available on ar Xiv (arxiv.org/abs/2412.08574), code and data available on Zenodo (zenodo.org/records/15614893).
Open Datasets	Yes	Appendix available on ar Xiv (arxiv.org/abs/2412.08574), code and data available on Zenodo (zenodo.org/records/15614893). The domains and training data are primarily from previous works on learning sketches and general policies [Drexler et al., 2022; St ahlberg et al., 2023]. This includes Blocks with single and multiple target towers, Childsnack, Delivery, Grid, Gripper, Logistics, Miconic, Reward, Spanner, and Visitall.
Dataset Splits	No	The domains and training data are primarily from previous works on learning sketches and general policies [Drexler et al., 2022; St ahlberg et al., 2023]. This includes Blocks with single and multiple target towers, Childsnack, Delivery, Grid, Gripper, Logistics, Miconic, Reward, Spanner, and Visitall. Each domain is tested on 40 larger instances, which extend those used in prior studies (details in the appendix). The paper mentions "training data", "validation set", and "test instances", but does not provide specific split percentages or methodology for reproducing these splits.
Hardware Specification	Yes	The Actor Critic algorithm uses a discount factor γ = 0.999, a learning rate α = 2 × 10−4, the Adam optimizer [Kingma and Ba, 2015], and runs on a single NVIDIA A10 GPU for up to 48 hours per domain. In the table, LM indicates the plan length computed by the classical planner LAMA, run on an Intel Xeon Platinum 8352M CPU with a 10-minute time and 100 GB memory limit.
Software Dependencies	No	We use the DRL implementation from [St ahlberg et al., 2023] with the same hyperparameters to learn the policy π that defines the decomposition Gπ k. The paper mentions using a DRL implementation and the Adam optimizer, but does not specify software names with version numbers for reproducibility. It also mentions "classical planner LAMA" without a version.
Experiment Setup	Yes	The GNN has feature vectors of size 64 and 30 layers. The Actor Critic algorithm uses a discount factor γ = 0.999, a learning rate α = 2 × 10−4, the Adam optimizer [Kingma and Ba, 2015], and runs on a single NVIDIA A10 GPU for up to 48 hours per domain. Five models are trained independently with different seeds, and the model with the best validation score is selected for testing. The validation score is determined by the ratio LV /L V , where LV is the plan length from SIWπ(k) and L V is the optimal plan length, both averaged over all states of a validation set. Training is stopped early if this ratio approaches 1.0.