Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach
Authors: Abeer Alshehri, Amal Abdulrahman, Hajar Alamri, Tim Miller, Mor Vered
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the model computationally across eight GR benchmarks and through three user studies. The first study assesses the efficiency of generating human-like explanations within the Sokoban game domain, the second examines perceived explainability in the same domain, and the third evaluates the model s effectiveness in aiding decision-making in illegal fishing detection. Results demonstrate that the XGR model significantly enhances user understanding, trust, and decision-making compared to baseline models, underscoring its potential to improve human-agent collaboration. |
| Researcher Affiliation | Academia | Abeer Alshehri EMAIL School of Computing and Information Systems The University of Melbourne, Melbourne, Australia Department of Computer Science and Information Systems King Khalid University, Abha, Saudi Arabia Amal Abdulrahman EMAIL School of Computing Macquarie University, Sydney, Australia Hajar Alamri EMAIL Department of Computer Science and Information Systems King Khalid University, Abha, Saudi Arabia Tim Miller EMAIL School of Electrical Engineering and Computer Science The University of Queensland, Brisbane, Australia Mor Vered EMAIL School of Computing and Information Systems Monash University, Melbourne, Australia |
| Pseudocode | Yes | Algorithm 1 Explanation Generation Algorithm Input: Oi, oi, Gp, Gc, and Posterior probability over G Output: Explanation list Ωfor all pairs (Gp, Gc) 1: Ω [] {Initialize explanation list} 2: for oi O do 3: for g Gp do 4: for g Gc do 5: ωi woe(g/g : oi | Oi)) {Compute Weight of Evidence (Wo E)} 6: Ω Ω {(g, g ) = ωi, oi } {Add explanation to list} 7: end for 8: end for 9: end for 10: return Ω |
| Open Source Code | No | All data will be made available upon request. |
| Open Datasets | Yes | We evaluate the computational cost of the XGR model over eight online GR benchmark domains (Vered et al., 2018) We obtained the dataset from Penney et al. (2021), which was collected from professional Star Craft tournaments available as videos on demand from 2016 and 2017. |
| Dataset Splits | No | The paper mentions collecting data for user studies and identifying instances for analysis (e.g., "a total of 132 instances out of the six samples" for Star Craft), but it does not specify explicit training/test/validation dataset splits with percentages, sample counts, or methodology for machine learning model evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | We used a STRIPS-like discrete planner to generate plan hypotheses derived from the domain theory and observations as our ground truth. |
| Experiment Setup | No | The paper describes the setup for human studies and computational evaluations, including the number of scenarios and participants, but does not provide specific model hyperparameters or training configurations for the XGR model (e.g., learning rate, batch size, number of epochs, optimizer settings) in the main text. |