GRAML: Goal Recognition As Metric Learning

Authors: Matan Shamir, Reuth Mirsky

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on a versatile set of environments, GRAML shows speed, flexibility, and runtime improvements over the state-of-the-art GR while maintaining accurate recognition.
Researcher Affiliation Academia Matan Shamir1 , Reuth Mirsky1,2 1Computer Science Department, Bar-Ilan University, Israel 2Computer Science Department, Tufts University, MA, USA EMAIL, EMAIL
Pseudocode No The paper describes steps in regular paragraph text without structured formatting, and no figures are labeled as pseudocode or algorithm.
Open Source Code Yes 1https://github.com/Matan Shamir1/Grlib
Open Datasets Yes Building on the GCRL survey and the benchmark environments suggested at Apex RL 2, we form a collection of GR problems from several sets of environments that adhere to the Gymnasium API 3, with detailed descriptions of each in Appendix ??. We consider two custom Minigrid environments from the minigrid package [Chevalier-Boisvert et al., 2023], two custom Point Maze environments from the Gymnasium-Robotics package [Fu et al., 2020], the Parking environment from the highway-env package [Leurent, 2018], and the Reach environment from Panda Gym [Gallou edec et al., 2021].
Dataset Splits No The paper mentions varying observation sequence lengths (30%, 50%, 70%, 100%) and generating '200 GR problems per scenario' but does not specify training, validation, or test dataset splits in a way that allows reproduction of data partitioning.
Hardware Specification Yes All experiments were conducted on a commodity Intel i-7 pro.
Software Dependencies No The paper mentions software like Python, PyTorch, Stable Baselines3, Gymnasium API, minigrid package, Gymnasium-Robotics package, highway-env package, and Panda Gym, but does not provide specific version numbers for these components.
Experiment Setup Yes Each single-goal agent was trained for 300,000 timesteps, and the goal-conditioned agent was trained for 1 million timesteps. ... G was set to 20, while BG-GRAML used only 5. ... For each environment, we tested observation sequences that are 30%, 50%, 70%, and 100% of the full sequence, both consecutively and non-consecutively.