Goal Recognition Design for General Behavioral Agents using Machine Learning

Authors: Robert Kasumba, Guanghui Yu, Chien-Ju Ho, Sarah Keren, William Yeoh

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing wcd and enhances runtime efficiency. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we conducted human-subject experiments that demonstrate that our method creates environments that facilitate efficient goal recognition from human decision-makers.
Researcher Affiliation Academia Robert Kasumba EMAIL Washington University in Saint Louis Guanghui Yu EMAIL Washington University in Saint Louis Chien-Ju Ho EMAIL Washington University in Saint Louis Sarah Keren EMAIL Technion Israel Institute of Technology William Yeoh EMAIL Washington University in Saint Louis
Pseudocode No The paper describes the optimization procedure as a 'discrete gradient descent procedure' in Section 3.3 but does not present it in a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for their methodology, nor does it provide a link to a code repository for their work. It only links to the OpenReview forum and the Overcooked-AI environment, which is a third-party tool used.
Open Datasets No To build the predictive model for wcd, we curate a training dataset through simulations. For an environment w and agent behavioral model h, we can obtain wcd(w, h) by solving for the agent s actions towards different goals. After collecting a training dataset, we train the predictive model using a convolutional neural network. The implementation details are in Section 5.1.1 and the appendix. In Experiment 1: Collection of Human Behavioral Data, it states: The collected human data were split into training (160 workers, 70,000 user decisions), validation, and testing sets (20 workers, 8,800 decisions each). The paper describes how the data was generated and collected but does not provide concrete access information (link, DOI, repository) for these datasets.
Dataset Splits Yes The collected human data were split into training (160 workers, 70,000 user decisions), validation, and testing sets (20 workers, 8,800 decisions each).
Hardware Specification Yes All experiments were run on a computing cluster equipped with 40 CPU cores (Intel Xeon Gold 6148 @ 2.40GHz), a single NVIDIA Tesla V100 SXM2 GPU (32GB), and up to 80GB of memory.
Software Dependencies No The paper states: 'Python 3.10 and widely used scientific libraries. Py Torch was our main deep learning framework, with Num Py and pandas handling numerical computation and data processing.' While Python 3.10 is specified, no version numbers are provided for PyTorch, NumPy, or pandas, which are key components.
Experiment Setup Yes We used Adam optimizer and MSE loss and tested learning rates of 0.1, 0.01, 0.001, and 0.0001. A learning rate of 0.001 consistently produced the lowest validation error... The best-performing configuration, CNN (100K, 0.001) combined with our gradient-based optimization, achieved the greatest reduction in wcd...