ASNets: Deep Learning for Generalised Planning
Authors: Sam Toyer, Sylvie Thiébaux, Felipe Trevizan, Lexing Xie
JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. |
| Researcher Affiliation | Academia | Sam Toyer EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley 2121 Berkeley Way, Berkeley CA 94720-1660, USA Sylvie Thiébaux EMAIL Felipe Trevizan EMAIL Lexing Xie EMAIL College of Engineering and Computer Science The Australian National University 145 Science Road, Canberra ACT 2601, Australia |
| Pseudocode | Yes | Algorithm 1 Learning ASNet weights θ from a set of training problems Ptrain. |
| Open Source Code | Yes | Code for our experiments is available on Git Hub.6 https://github.com/qxcv/asnets |
| Open Datasets | Yes | All instances for this task (as well as Exploding and Deterministic Blocksworld) were generated by the algorithm from Slaney and Thiébaux (2001). |
| Dataset Splits | Yes | We train Triangle Tireworld policies on three problems of sizes 1–3, and test on 17 problems of sizes 4–20. We train Cosa Nostra Pizza policies on five problems with 1–5 toll booths, and test on 17 problems with 6–50 toll booths. We train Probabilistic Blocksworld policies on 25 problems with 5–9 blocks, and test on 30 problems with 15–40 blocks. |
| Hardware Specification | Yes | each run was restricted to a single core of an Intel Xeon Platinum 8175 processor attached to an Amazon AWS r5.12xlarge instance, with 16GB of memory available per run. |
| Software Dependencies | No | The paper mentions "Ray Tune automated hyperparameter tuning framework (Liaw et al., 2018) and the random forest optimiser from scikit-optimize" and "Adam optimiser", but does not provide specific version numbers for these software packages or libraries. |
| Experiment Setup | Yes | Our networks have two proposition layers and three action layers (i.e. L = 2), with dh = 16 output channels for each action or proposition module. Training is divided into a series of epochs... More specifically, at the beginning of each epoch, up to Texplore = 70/|Ptrain| trajectories are sampled... Ttrain = 700 batches of network optimisation... minibatch... 64 samples... Adam optimiser (β1 = 0.9, β2 = 0.999, ϵ = 10−8) with a learning rate of 10−3. We apply an ℓ2 regulariser of 2 × 10−4 to prevent weights from exploding, and dropout probability of 0.1 for all layers. |