reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expressing and Exploiting Subgoal Structure in Classical Planning Using Sketches

Authors: Dominik Drexler, Jendrik Seipp, Hector Geffner

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments Even though the focus of our work is on proving polynomial runtime bounds for planning domains theoretically, we evaluate in this section how these runtime guarantees translate into practice. We implemented two versions of SIWR: one version, denoted by SIWG R, is based on the LAPKT planning system (Ramirez, Lipovetzky, & Muise, 2015) and grounds the input task to a propositional representation before the search, the other version, denoted by SIWL R, is implemented in the Mimir planning system (St ahlberg, 2023) and searches on the lifted task representation directly. Both versions use the DLPlan library (Drexler, Franc es, & Seipp, 2022) to represent and evaluate features (see the appendix for details). We use the Lab toolkit (Seipp, Pommerening, Sievers, & Helmert, 2017) for running experiments on Intel Xeon Gold 6130 CPU cores. For each planner run, we limit time and memory by 30 minutes and 3 Gi B. We collected benchmark tasks for the domains analyzed above from two different sources. The first source is the satisficing track of previous IPCs. Since many of these tasks are trivial for state-of-the-art planners, we also consider tasks from the new Autoscale benchmark set (Torralba, Seipp, & Sievers, 2021). ... Table 1 shows results for the five planners. In addition to the number of solved tasks and planner runtimes, which we discuss below, for the planners based on SIW, the table also holds data about the effective width.
Researcher Affiliation	Academia	Dominik Drexler EMAIL Jendrik Seipp EMAIL Link oping University, Sweden Hector Geffner EMAIL RWTH Aachen University, Germany Link oping University, Sweden
Pseudocode	No	The paper describes the SIWR algorithm in text, stating: "The SIWR algorithm is a variant of the SIW algorithm that uses a given sketch R = RΦ for solving problems P Q. SIWR starts at the state s := s0, where s0 is the initial state of P, and then performs an IW search to find a state s that is closest from s such that s is a goal state of P or a subgoal state in Gr(s) for some sketch rule r in R. If s is not a goal state, then s is set to s , s := s , and the loop repeats until a goal state is reached." However, this is not presented as a structured pseudocode block or algorithm figure.
Open Source Code	Yes	Fourth, we release the code for constructing and evaluating description logic features in the form of an open-source C++ library (with Python bindings), called DLPlan. ... The code and data are available online (Drexler, Seipp, & Geffner, 2023a).
Open Datasets	Yes	We collected benchmark tasks for the domains analyzed above from two different sources. The first source is the satisficing track of previous IPCs. Since many of these tasks are trivial for state-of-the-art planners, we also consider tasks from the new Autoscale benchmark set (Torralba, Seipp, & Sievers, 2021).
Dataset Splits	No	The paper mentions collecting "benchmark tasks" for various domains from "previous IPCs" and the "Autoscale benchmark set (Torralba, Seipp, & Sievers, 2021)", but does not provide specific details on how these tasks are split into training, validation, or test sets, nor does it specify exact percentages, sample counts, or a detailed splitting methodology.
Hardware Specification	Yes	We use the Lab toolkit (Seipp, Pommerening, Sievers, & Helmert, 2017) for running experiments on Intel Xeon Gold 6130 CPU cores.
Software Dependencies	No	The paper mentions using the LAPKT planning system, the Mimir planning system, the DLPlan library, and the Lab toolkit, but does not provide specific version numbers for these software components, which are necessary for full reproducibility.
Experiment Setup	Yes	For each planner run, we limit time and memory by 30 minutes and 3 Gi B. We use a width bound of k = 2 because SIW(k) and SIWR(k) are too slow to compute in practice for larger values of k. For more robust and comparable results, we always randomly shuffle the applicable actions of a state before generating its successor states.