reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Classical Planning in Deep Latent Space

Authors: Masataro Asai, Hiroshi Kajino, Alex Fukunaga, Christian Muise

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of Lights Out. Section 10 presents empirical evaluations of the accuracy and stability of the SAE, as well as the action model accuracy of AMA+ 3 and AMA+ 4 . Section 11 presents empirical evaluation of end-to-end planning with Latplan, including the effectiveness of standard planning heuristics.
Researcher Affiliation	Collaboration	Masataro Asai MASATARO.ASAI α IBM.COM MIT-IBM Watson AI Lab, IBM Research, Cambridge, USA Hiroshi Kajino KAJINO α JP.IBM.COM IBM Research Tokyo, Tokyo, Japan Alex Fukunaga FUKUNAGA α IDEA.C.U-TOKYO.AC.JP Graduate School of Arts and Sciences The University of Tokyo Tokyo, Japan Christian Muise CHRISTIAN.MUISE α QUEENSU.CA School of Computing, Queen s University, Kingston, Canada
Pseudocode	Yes	Algorithm 1 An abstract pipeline of Latplan framework. Training Phase: Require: Dataset X, untrained machine learning model M 1: Trained model M TRAIN(M, X) 2: M provides functions ENCODE and DECODE. 3: PDDL domain ﬁle D GENERATEDOMAIN(M ) 4: return M , D Planning Phase: Require: M , D, initial state observation x I, goal state observation x G 1: Encode x I, x G into propositional states z I, z G 2: PDDL problem ﬁle P GENERATEPROBLEM(z I, z G) 3: Plan π = (a0, a1, . . . , ) SOLVE(P, D) using a planner (e.g., Fast Downward) 4: State trace (z I = z0, z1 = a0(z0), z2 = a1(z1), . . . , z G) SIMULATE(π, z I, D) using a plan validator for PDDL, e.g., VAL (Howey & Long, 2003). 5: return Decode an observation trace (x I = x0, x1, x2, . . . , x G).
Open Source Code	Yes	The source code of these validators are available in the source code of Latplan repository 6. https://github.com/guicho271828/latplan/
Open Datasets	Yes	MNIST 8-Puzzle (Asai & Fukunaga, 2018) is a 42x42 pixel, monochrome image-based version of the 8-Puzzle. Tiles contain hand-written digits (0-9) from the MNIST database (Le Cun et al., 1998), which are shrunk to 14x14 pixels so that each state of the puzzle is a 42x42 image. The Scrambled Photograph (Mandrill) 15-puzzle cuts and scrambles a real photograph, similar to the puzzles sold in stores. We used a Mandrill image taken from the USC-SIPI benchmark set (Weber, 1997). Photo-realistic Blocksworld (Asai, 2018) is a dataset that consists of 100x150 RGB images rendered by Blender 3D engine. Sokoban (Culberson, 1998; Junghanns & Schaeffer, 2001) is a PSPACE-hard puzzle domain whose 112x112 pixel visualizations are obtained from the PDDLGym library (Silver & Chitnis, 2020)
Dataset Splits	Yes	These datasets are divided into 90%,5%,5% for the training set, validation/tuning set, and testing set, respectively.
Hardware Specification	Yes	All experiments are performed on a distributed compute cluster equipped with n Vidia Tesla K80 / V100 / A100 GPUs and Xeon E5-2600 v4.
Software Dependencies	No	The system is implemented on top of Keras library (Chollet et al., 2015). We ran the off-the-shelf planner Fast Downward on the PDDL generated by our system. Optimizer Rectiﬁed Adam (Liu et al., 2019). The paper mentions software such as Keras, Fast Downward, and Rectified Adam, but it does not provide specific version numbers for any of these components, which is required for reproducibility.
Experiment Setup	Yes	Table 10.2: List of hyperparameters. Training parameters Optimizer Rectiﬁed Adam (Liu et al., 2019) Training Epochs 2000 Batch size 400 Learning rate 10 3 Gradient norm clipping 0.1 Gumbel Softmax / Binary Concrete annealing parameters Initial annealing temperature τmax 5 Final annealing temperature τmin 0.5 Annealing schedule τ(t) for epoch t τmax τmin τmax min(t,1000) Network shape parameters Latent space dimension F F {50, 100, 300} Maximum number of actions A A = 6000 Loss and Regularization Parameters σ for all reconstruction losses (cf. Section 2.3) 0.1 β1 for DKL(q(zi,0 \| xi,0) p(zi,0)) β1 {1, 10} β2 for DKL(q(ai \| xi,0, xi,1) p(ai \| zi,0)) 1 β3 for DKL(q(zi,1 \| xi,1) p(zi,2 \| zi,0, ai)) β3 {1, 10, 100, 1000, 10000}