Contrastive Explanations of Plans through Model Restrictions

Authors: Benjamin Krarup, Senka Krivic, Daniele Magazzeni, Derek Long, Michael Cashmore, David E. Smith

JAIR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation falls into two parts: we evaluate the performance of the compilation of constraints by examining the planning time and plan quality produced for a large sample of problems, and we also present the user study that explores the value of the iterative process of plan explanation. The latter evaluation is based on observed interactions with an implemented system and is, therefore, more qualitative in style than the former evaluation. Nevertheless, both evaluations together serve to support our claims that the approach we have described provides a paradigm that allows users to usefully explore explanations of plans, by asking contrastive questions and being supplied plans in response to the constraints implied by those questions. ... We conducted a study with 20 volunteers (5 students, 4 engineers, 4 software developers, 3 researchers, 2 assistant professors, a chemist and a copywriter) divided in two groups with 10 persons each. Participants ages ranged from 23 to 43 years, with 35% identifying as female and 65% identifying as male. The average time G1 spent with the plan and the framework is 24.2 minutes, and on average, they asked 5.1 questions. The average time G2 spent with the plan and the framework is 21.1 minutes, and on average, they asked 3.7 questions as can be seen in Figure 30. The maximum number of question asked was 10, while the minimum number was 1.
Researcher Affiliation Collaboration Benjamin Krarup EMAIL Senka Krivic EMAIL Daniele Magazzeni EMAIL Derek Long EMAIL King s College London, Bush House, WC2B 4BG, London, UK Michael Cashmore EMAIL University of Strathclyde, Livingstone Tower, G1 1XH, Glasgow, UK David E. Smith EMAIL PS Research, 25960 Quail Ln, Los Altos Hills, CA 94022, USA
Pseudocode No The paper formally defines planning models and compilations using mathematical notation and PDDL2.1 syntax fragments (e.g., Figures 9, 15, 17), but it does not include a dedicated section or block explicitly labeled as "Pseudocode" or "Algorithm" for the overall methodology.
Open Source Code Yes All source code and example domain and problem files are open source and available online: https://github.com/KCL-Planning/XAIPFramework.
Open Datasets Yes We used four temporal domains from the recent ICAPS international planning competitions (IPC) (Long & Fox, 2003) in our experiments. The IPC produces a new set of benchmark domains each year to test the capabilities and progress made by AI planners for different types of problems. We selected domains to be varied in what they modelled and the most interesting in terms of explainability. These are the Zeno Travel, Depots (IPC3), Crew Planning and Elevators (IPC8) domains.
Dataset Splits No The paper uses problems from established IPC benchmarks, selecting specific problems (e.g., "problem 10 for the Depots domain", "problems 1 to 10 for the Crew Planning domain"). It does not describe splitting a single dataset into training, validation, or test sets with percentages or sample counts, which is the typical meaning of dataset splits.
Hardware Specification Yes All tests used a Core i7 1.9GHZ machine, and 16GB of memory.
Software Dependencies No The paper mentions several software components like PDDL2.1, POPF, Metric-FF, OPTIC, and VAL, along with Qt-Designer. However, it does not provide specific version numbers for these software packages or for its own framework's dependencies, which is necessary for reproducible software dependency information.
Experiment Setup Yes We found that for each of the problems 3 minutes planning time was sufficient, other than problem 10 for the Depots domain which required 6 minutes. ... All tests used a Core i7 1.9GHZ machine, limited to five minutes and 16GB of memory. We increased the planning time from the first experiment by two minutes to offer a larger window through which to view any growth trends in the planning time for models with iterated constraints. ... To ensure that the questions made sense, we had to take slightly different approaches to generating each question type. For each formal question other than FQ1 and FQ3, the actions were randomly selected from the original plan found from the appropriate model. ... The lower bound was generated using a pseudo-random number generator, constrained to within the original plan time. The upper bound was formed by first generating a number between 1.5 and 4 and then multiplying the number by the duration of the selected action.