Classical Planning in Deep Latent Space
Authors: Masataro Asai, Hiroshi Kajino, Alex Fukunaga, Christian Muise
JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of Lights Out. Section 10 presents empirical evaluations of the accuracy and stability of the SAE, as well as the action model accuracy of AMA+ 3 and AMA+ 4 . Section 11 presents empirical evaluation of end-to-end planning with Latplan, including the effectiveness of standard planning heuristics. |
| Researcher Affiliation | Collaboration | Masataro Asai MASATARO.ASAI α IBM.COM MIT-IBM Watson AI Lab, IBM Research, Cambridge, USA Hiroshi Kajino KAJINO α JP.IBM.COM IBM Research Tokyo, Tokyo, Japan Alex Fukunaga FUKUNAGA α IDEA.C.U-TOKYO.AC.JP Graduate School of Arts and Sciences The University of Tokyo Tokyo, Japan Christian Muise CHRISTIAN.MUISE α QUEENSU.CA School of Computing, Queen s University, Kingston, Canada |
| Pseudocode | Yes | Algorithm 1 An abstract pipeline of Latplan framework. Training Phase: Require: Dataset X, untrained machine learning model M 1: Trained model M TRAIN(M, X) 2: M provides functions ENCODE and DECODE. 3: PDDL domain file D GENERATEDOMAIN(M ) 4: return M , D Planning Phase: Require: M , D, initial state observation x I, goal state observation x G 1: Encode x I, x G into propositional states z I, z G 2: PDDL problem file P GENERATEPROBLEM(z I, z G) 3: Plan π = (a0, a1, . . . , ) SOLVE(P, D) using a planner (e.g., Fast Downward) 4: State trace (z I = z0, z1 = a0(z0), z2 = a1(z1), . . . , z G) SIMULATE(π, z I, D) using a plan validator for PDDL, e.g., VAL (Howey & Long, 2003). 5: return Decode an observation trace (x I = x0, x1, x2, . . . , x G). |
| Open Source Code | Yes | The source code of these validators are available in the source code of Latplan repository 6. https://github.com/guicho271828/latplan/ |
| Open Datasets | Yes | MNIST 8-Puzzle (Asai & Fukunaga, 2018) is a 42x42 pixel, monochrome image-based version of the 8-Puzzle. Tiles contain hand-written digits (0-9) from the MNIST database (Le Cun et al., 1998), which are shrunk to 14x14 pixels so that each state of the puzzle is a 42x42 image. The Scrambled Photograph (Mandrill) 15-puzzle cuts and scrambles a real photograph, similar to the puzzles sold in stores. We used a Mandrill image taken from the USC-SIPI benchmark set (Weber, 1997). Photo-realistic Blocksworld (Asai, 2018) is a dataset that consists of 100x150 RGB images rendered by Blender 3D engine. Sokoban (Culberson, 1998; Junghanns & Schaeffer, 2001) is a PSPACE-hard puzzle domain whose 112x112 pixel visualizations are obtained from the PDDLGym library (Silver & Chitnis, 2020) |
| Dataset Splits | Yes | These datasets are divided into 90%,5%,5% for the training set, validation/tuning set, and testing set, respectively. |
| Hardware Specification | Yes | All experiments are performed on a distributed compute cluster equipped with n Vidia Tesla K80 / V100 / A100 GPUs and Xeon E5-2600 v4. |
| Software Dependencies | No | The system is implemented on top of Keras library (Chollet et al., 2015). We ran the off-the-shelf planner Fast Downward on the PDDL generated by our system. Optimizer Rectified Adam (Liu et al., 2019). The paper mentions software such as Keras, Fast Downward, and Rectified Adam, but it does not provide specific version numbers for any of these components, which is required for reproducibility. |
| Experiment Setup | Yes | Table 10.2: List of hyperparameters. Training parameters Optimizer Rectified Adam (Liu et al., 2019) Training Epochs 2000 Batch size 400 Learning rate 10 3 Gradient norm clipping 0.1 Gumbel Softmax / Binary Concrete annealing parameters Initial annealing temperature τmax 5 Final annealing temperature τmin 0.5 Annealing schedule τ(t) for epoch t τmax τmin τmax min(t,1000) Network shape parameters Latent space dimension F F {50, 100, 300} Maximum number of actions A A = 6000 Loss and Regularization Parameters σ for all reconstruction losses (cf. Section 2.3) 0.1 β1 for DKL(q(zi,0 | xi,0) p(zi,0)) β1 {1, 10} β2 for DKL(q(ai | xi,0, xi,1) p(ai | zi,0)) 1 β3 for DKL(q(zi,1 | xi,1) p(zi,2 | zi,0, ai)) β3 {1, 10, 100, 1000, 10000} |