Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Learning More Expressive General Policies for Classical Planning Domains
Authors: Simon Ståhlberg, Blai Bonet, Hector Geffner
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results illustrate the clear performance gains of R-GNN[1] over the plain R-GNNs, and also over Edge Transformers that also approximate 3-GNNs. Our experiments demonstrate that R-GNN[t], even with small values of t, is practically feasible and significantly improves both the coverage and the quality of the learned general plans when compared to four baselines... Results Tables 1 and 2 show the results. |
| Researcher Affiliation | Academia | Simon St ahlberg1, Blai Bonet2, Hector Geffner1 1RWTH Aachen University, Germany 2Universitat Pompeu Fabra, Spain EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Relational GNN (R-GNN) |
| Open Source Code | Yes | Code, data and models: https://zenodo.org/records/14505092. |
| Open Datasets | Yes | Code, data and models: https://zenodo.org/records/14505092. Domains Brief descriptions of the domains used in the experiments, mostly taken from St ahlberg, Bonet, and Geffner (2022a; 2022b; 2023), follow. In all cases, the instances in the training set are small, while those in the test set are significantly larger as they contain more objects. Blocks. Grid. Gripper. Logistics. Miconic. Rovers. Vacuum. Visitall. |
| Dataset Splits | Yes | For each domain in the benchmark, we learn a general value function V in a supervised manner from the optimal values V (S) over a small collection of training states S. The instances in the training set are small, while those in the test set are significantly larger as they contain more objects. Blocks. In Blocks-s (resp. Blocks-m), a single tower (resp. multiple towers) must be built. Both have training and validation sets with 4 to 9 blocks. The test set for Blocks-s (resp. Blocks-m) has 10 to 17 blocks (resp. up to 20 blocks). |
| Hardware Specification | Yes | trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours |
| Software Dependencies | No | The paper mentions "implemented the architectures in Py Torch" and "using Adam", but does not provide specific version numbers for PyTorch or any other libraries or frameworks. |
| Experiment Setup | Yes | For learning value functions, we implemented the architectures in Py Torch, and trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours, using Adam (Kingma and Ba 2015) with a learning rate of 0.0002, batches of size 16, and without applying any regularization loss. We used embedding dimension k = 64, L = 30 layers for R-GNN, R-GNN[t] and the ETs. For R-GNN2, we used k = 32 to avoid running out of memory during training. In all approaches, all layers share weights, and the ETs have 8 self-attention heads. |