reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning More Expressive General Policies for Classical Planning Domains

Authors: Simon Ståhlberg, Blai Bonet, Hector Geffner

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results illustrate the clear performance gains of R-GNN[1] over the plain R-GNNs, and also over Edge Transformers that also approximate 3-GNNs. Our experiments demonstrate that R-GNN[t], even with small values of t, is practically feasible and signiﬁcantly improves both the coverage and the quality of the learned general plans when compared to four baselines... Results Tables 1 and 2 show the results.
Researcher Affiliation	Academia	Simon St ahlberg1, Blai Bonet2, Hector Geffner1 1RWTH Aachen University, Germany 2Universitat Pompeu Fabra, Spain EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Relational GNN (R-GNN)
Open Source Code	Yes	Code, data and models: https://zenodo.org/records/14505092.
Open Datasets	Yes	Code, data and models: https://zenodo.org/records/14505092. Domains Brief descriptions of the domains used in the experiments, mostly taken from St ahlberg, Bonet, and Geffner (2022a; 2022b; 2023), follow. In all cases, the instances in the training set are small, while those in the test set are signiﬁcantly larger as they contain more objects. Blocks. Grid. Gripper. Logistics. Miconic. Rovers. Vacuum. Visitall.
Dataset Splits	Yes	For each domain in the benchmark, we learn a general value function V in a supervised manner from the optimal values V (S) over a small collection of training states S. The instances in the training set are small, while those in the test set are signiﬁcantly larger as they contain more objects. Blocks. In Blocks-s (resp. Blocks-m), a single tower (resp. multiple towers) must be built. Both have training and validation sets with 4 to 9 blocks. The test set for Blocks-s (resp. Blocks-m) has 10 to 17 blocks (resp. up to 20 blocks).
Hardware Specification	Yes	trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours
Software Dependencies	No	The paper mentions "implemented the architectures in Py Torch" and "using Adam", but does not provide specific version numbers for PyTorch or any other libraries or frameworks.
Experiment Setup	Yes	For learning value functions, we implemented the architectures in Py Torch, and trained the models on NVIDIA A10 GPUs with 24 GB of memory over 12 hours, using Adam (Kingma and Ba 2015) with a learning rate of 0.0002, batches of size 16, and without applying any regularization loss. We used embedding dimension k = 64, L = 30 layers for R-GNN, R-GNN[t] and the ETs. For R-GNN2, we used k = 32 to avoid running out of memory during training. In all approaches, all layers share weights, and the ETs have 8 self-attention heads.