reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

State Encodings for GNN-Based Lifted Planners

Authors: Rostislav Horčik, Gustav Šír, Vítězslav Šimek, Tomáš Pevný

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper provides an extensive comparative analysis of existing encodings. Our results indicate that the smallest encoding based on Gaifman graphs, not yet applied in planning, outperforms the rest due to its fast evaluation times and the informativeness of the resulting heuristic. The overall coverage measured on the IPC almost reaches that of the state-of-the-art planner LAMA while exhibiting rather complementary strengths across different domains. [...] In particular, we study not only the obtained coverage for a given time limit but also the size of the state encodings, evaluation time, the number of expanded states during the search, and plan lengths. To evaluate the state encodings, we apply them offline to simultaneously measure the evaluation time and the informativeness via the number of expanded states. [...] Section 5 summarizes our experimental results. [...] The overall coverage (i.e., the amount of solved problems for each planning domain) is shown in Table 1.
Researcher Affiliation	Academia	Rostislav Horˇcík, Gustav Šír, Vítˇezslav Šimek, Tomáš Pevný Czech Technical University in Prague EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the GNN message-passing architecture and loss function using mathematical notation and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/pevnak/Neuro Planner.jl
Open Datasets	Yes	The GNN-based pipeline was tested on the IPC 2023 learning track (Seipp and Segovia-Aguas 2023). To train the GNNs, we used the optimal plans for the training instances that we were able to solve by an optimal planner.
Dataset Splits	No	Testing instances were randomly split into two parts: one part was used to select the best hyperparameters based on coverage, while the second part s coverage was measured. The roles of the parts were then swapped, and coverage was measured for the first part instead. This describes a cross-validation strategy for hyperparameter tuning on testing instances, but does not specify the global train/test/validation split or their sizes for the full IPC 2023 learning track dataset used in the experiments.
Hardware Specification	Yes	Each domain was evaluated on AMD EPYC 7543 using eight cores and 64GB of memory limit allocated by the Slurm scheduler.
Software Dependencies	No	The paper provides a GitHub link to a Julia project, implying Julia is used, but it does not specify any version numbers for Julia or any other software libraries or dependencies.
Experiment Setup	Yes	The concrete functions comb and msg are implemented as a single linear layer followed by the Re LU activation function. The read-out function ro is formed by two linear layers interleaved with Re LU. For the aggregation function we use, based on our experience, both sum and max, i.e., agg(M) = sum(M), max(M) . Regarding the hyperparameters, there is only a single hidden dimension of vertex labels v(i) for i > 0 that is taken from {4, 16, 32}. The number of GNN layers ranges from 1 to 3. [...] We applied the ranking loss function L introduced in (Chrestien et al. 2023).