ASNets: Deep Learning for Generalised Planning

Authors: Sam Toyer, Sylvie Thiébaux, Felipe Trevizan, Lexing Xie

JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study.
Researcher Affiliation Academia Sam Toyer EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley 2121 Berkeley Way, Berkeley CA 94720-1660, USA Sylvie Thiébaux EMAIL Felipe Trevizan EMAIL Lexing Xie EMAIL College of Engineering and Computer Science The Australian National University 145 Science Road, Canberra ACT 2601, Australia
Pseudocode Yes Algorithm 1 Learning ASNet weights θ from a set of training problems Ptrain.
Open Source Code Yes Code for our experiments is available on Git Hub.6 https://github.com/qxcv/asnets
Open Datasets Yes All instances for this task (as well as Exploding and Deterministic Blocksworld) were generated by the algorithm from Slaney and Thiébaux (2001).
Dataset Splits Yes We train Triangle Tireworld policies on three problems of sizes 1–3, and test on 17 problems of sizes 4–20. We train Cosa Nostra Pizza policies on five problems with 1–5 toll booths, and test on 17 problems with 6–50 toll booths. We train Probabilistic Blocksworld policies on 25 problems with 5–9 blocks, and test on 30 problems with 15–40 blocks.
Hardware Specification Yes each run was restricted to a single core of an Intel Xeon Platinum 8175 processor attached to an Amazon AWS r5.12xlarge instance, with 16GB of memory available per run.
Software Dependencies No The paper mentions "Ray Tune automated hyperparameter tuning framework (Liaw et al., 2018) and the random forest optimiser from scikit-optimize" and "Adam optimiser", but does not provide specific version numbers for these software packages or libraries.
Experiment Setup Yes Our networks have two proposition layers and three action layers (i.e. L = 2), with dh = 16 output channels for each action or proposition module. Training is divided into a series of epochs... More specifically, at the beginning of each epoch, up to Texplore = 70/|Ptrain| trajectories are sampled... Ttrain = 700 batches of network optimisation... minibatch... 64 samples... Adam optimiser (β1 = 0.9, β2 = 0.999, ϵ = 10−8) with a learning rate of 10−3. We apply an ℓ2 regulariser of 2 × 10−4 to prevent weights from exploding, and dropout probability of 0.1 for all layers.