Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Learning More Expressive General Policies for Classical Planning Domains

Authors: Simon Ståhlberg, Blai Bonet, Hector Geffner

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results illustrate the clear performance gains of R-GNN[1] over the plain R-GNNs, and also over Edge Transformers that also approximate 3-GNNs. Our experiments demonstrate that R-GNN[t], even with small values of t, is practically feasible and significantly improves both the coverage and the quality of the learned general plans when compared to four baselines... Results Tables 1 and 2 show the results.
Researcher Affiliation Academia Simon St ahlberg1, Blai Bonet2, Hector Geffner1 1RWTH Aachen University, Germany 2Universitat Pompeu Fabra, Spain EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Relational GNN (R-GNN)
Open Source Code Yes Code, data and models: https://zenodo.org/records/14505092.
Open Datasets Yes Code, data and models: https://zenodo.org/records/14505092. Domains Brief descriptions of the domains used in the experiments, mostly taken from St ahlberg, Bonet, and Geffner (2022a; 2022b; 2023), follow. In all cases, the instances in the training set are small, while those in the test set are significantly larger as they contain more objects. Blocks. Grid. Gripper. Logistics. Miconic. Rovers. Vacuum. Visitall.
Dataset Splits Yes For each domain in the benchmark, we learn a general value function V in a supervised manner from the optimal values V (S) over a small collection of training states S. The instances in the training set are small, while those in the test set are significantly larger as they contain more objects. Blocks. In Blocks-s (resp. Blocks-m), a single tower (resp. multiple towers) must be built. Both have training and validation sets with 4 to 9 blocks. The test set for Blocks-s (resp. Blocks-m) has 10 to 17 blocks (resp. up to 20 blocks).
Hardware Specification Yes trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours
Software Dependencies No The paper mentions "implemented the architectures in Py Torch" and "using Adam", but does not provide specific version numbers for PyTorch or any other libraries or frameworks.
Experiment Setup Yes For learning value functions, we implemented the architectures in Py Torch, and trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours, using Adam (Kingma and Ba 2015) with a learning rate of 0.0002, batches of size 16, and without applying any regularization loss. We used embedding dimension k = 64, L = 30 layers for R-GNN, R-GNN[t] and the ETs. For R-GNN2, we used k = 32 to avoid running out of memory during training. In all approaches, all layers share weights, and the ETs have 8 self-attention heads.