Multi-intention Inverse Q-learning for Interpretable Behavior Representation

Authors: Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions.
Researcher Affiliation Academia Hao Zhu EMAIL Department of Computer Science & IMBIT//Brain Links-Brain Tools, University of Freiburg
Pseudocode Yes Algorithm 1 Inverse action-value iteration. given expert demonstrations D. Algorithm 2 Inverse Q-learning. given expert demonstrations D; learning rate αr, αQ, and αSh. Algorithm 3 Hierarchical inverse Q-learning. given expert demonstrations D and reward set cardinality K.
Open Source Code Yes Implementation for the class of HIQL algorithms can be found at https://github.com/haozhu10015/hiql.
Open Datasets Yes The expert demonstrations in this real-world dataset were originally collected by Rosenberg et al. (2021) from a 127-node labyrinth navigation task. ... The original recorded animal trajectories from Rosenberg et al. (2021) are provided with MIT open source license at the following repository: https://github.com/markusmeister/Rosenberg-2021-Repository.
Dataset Splits Yes All evaluated algorithms were fit to multiple sub-datasets with different number of expert trajectories, and each dataset was analyzed using a 5-fold cross-validation. ... 20% of the trajectories from each dataset were held out as a test set. ... A 5-fold cross-validation was used to split the training and test dataset.
Hardware Specification No The paper describes experiments in a simulated gridworld environment and on real-world mice behavior datasets, but does not specify any particular hardware (e.g., GPU/CPU models) used for running these experiments or simulations.
Software Dependencies No The paper does not provide specific software names with version numbers for libraries, frameworks, or environments used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The discount factor of the MDP was set to be γ = 0.9. ... The initial intention distribution Π was initialized uniformly, and the intention transition matrix Λ was initialized as: Λ = 0.95 I + N(0, 0.05 I) ... The discount factor fot this experiment were set to be γ = 0.99.