Classical Planning in Deep Latent Space

Authors: Masataro Asai, Hiroshi Kajino, Alex Fukunaga, Christian Muise

JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of Lights Out. Section 10 presents empirical evaluations of the accuracy and stability of the SAE, as well as the action model accuracy of AMA+ 3 and AMA+ 4 . Section 11 presents empirical evaluation of end-to-end planning with Latplan, including the effectiveness of standard planning heuristics.
Researcher Affiliation Collaboration Masataro Asai MASATARO.ASAI α IBM.COM MIT-IBM Watson AI Lab, IBM Research, Cambridge, USA Hiroshi Kajino KAJINO α JP.IBM.COM IBM Research Tokyo, Tokyo, Japan Alex Fukunaga FUKUNAGA α IDEA.C.U-TOKYO.AC.JP Graduate School of Arts and Sciences The University of Tokyo Tokyo, Japan Christian Muise CHRISTIAN.MUISE α QUEENSU.CA School of Computing, Queen s University, Kingston, Canada
Pseudocode Yes Algorithm 1 An abstract pipeline of Latplan framework. Training Phase: Require: Dataset X, untrained machine learning model M 1: Trained model M TRAIN(M, X) 2: M provides functions ENCODE and DECODE. 3: PDDL domain file D GENERATEDOMAIN(M ) 4: return M , D Planning Phase: Require: M , D, initial state observation x I, goal state observation x G 1: Encode x I, x G into propositional states z I, z G 2: PDDL problem file P GENERATEPROBLEM(z I, z G) 3: Plan π = (a0, a1, . . . , ) SOLVE(P, D) using a planner (e.g., Fast Downward) 4: State trace (z I = z0, z1 = a0(z0), z2 = a1(z1), . . . , z G) SIMULATE(π, z I, D) using a plan validator for PDDL, e.g., VAL (Howey & Long, 2003). 5: return Decode an observation trace (x I = x0, x1, x2, . . . , x G).
Open Source Code Yes The source code of these validators are available in the source code of Latplan repository 6. https://github.com/guicho271828/latplan/
Open Datasets Yes MNIST 8-Puzzle (Asai & Fukunaga, 2018) is a 42x42 pixel, monochrome image-based version of the 8-Puzzle. Tiles contain hand-written digits (0-9) from the MNIST database (Le Cun et al., 1998), which are shrunk to 14x14 pixels so that each state of the puzzle is a 42x42 image. The Scrambled Photograph (Mandrill) 15-puzzle cuts and scrambles a real photograph, similar to the puzzles sold in stores. We used a Mandrill image taken from the USC-SIPI benchmark set (Weber, 1997). Photo-realistic Blocksworld (Asai, 2018) is a dataset that consists of 100x150 RGB images rendered by Blender 3D engine. Sokoban (Culberson, 1998; Junghanns & Schaeffer, 2001) is a PSPACE-hard puzzle domain whose 112x112 pixel visualizations are obtained from the PDDLGym library (Silver & Chitnis, 2020)
Dataset Splits Yes These datasets are divided into 90%,5%,5% for the training set, validation/tuning set, and testing set, respectively.
Hardware Specification Yes All experiments are performed on a distributed compute cluster equipped with n Vidia Tesla K80 / V100 / A100 GPUs and Xeon E5-2600 v4.
Software Dependencies No The system is implemented on top of Keras library (Chollet et al., 2015). We ran the off-the-shelf planner Fast Downward on the PDDL generated by our system. Optimizer Rectified Adam (Liu et al., 2019). The paper mentions software such as Keras, Fast Downward, and Rectified Adam, but it does not provide specific version numbers for any of these components, which is required for reproducibility.
Experiment Setup Yes Table 10.2: List of hyperparameters. Training parameters Optimizer Rectified Adam (Liu et al., 2019) Training Epochs 2000 Batch size 400 Learning rate 10 3 Gradient norm clipping 0.1 Gumbel Softmax / Binary Concrete annealing parameters Initial annealing temperature τmax 5 Final annealing temperature τmin 0.5 Annealing schedule τ(t) for epoch t τmax τmin τmax min(t,1000) Network shape parameters Latent space dimension F F {50, 100, 300} Maximum number of actions A A = 6000 Loss and Regularization Parameters σ for all reconstruction losses (cf. Section 2.3) 0.1 β1 for DKL(q(zi,0 | xi,0) p(zi,0)) β1 {1, 10} β2 for DKL(q(ai | xi,0, xi,1) p(ai | zi,0)) 1 β3 for DKL(q(zi,1 | xi,1) p(zi,2 | zi,0, ai)) β3 {1, 10, 100, 1000, 10000}