Multi-intention Inverse Q-learning for Interpretable Behavior Representation
Authors: Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions. |
| Researcher Affiliation | Academia | Hao Zhu EMAIL Department of Computer Science & IMBIT//Brain Links-Brain Tools, University of Freiburg |
| Pseudocode | Yes | Algorithm 1 Inverse action-value iteration. given expert demonstrations D. Algorithm 2 Inverse Q-learning. given expert demonstrations D; learning rate αr, αQ, and αSh. Algorithm 3 Hierarchical inverse Q-learning. given expert demonstrations D and reward set cardinality K. |
| Open Source Code | Yes | Implementation for the class of HIQL algorithms can be found at https://github.com/haozhu10015/hiql. |
| Open Datasets | Yes | The expert demonstrations in this real-world dataset were originally collected by Rosenberg et al. (2021) from a 127-node labyrinth navigation task. ... The original recorded animal trajectories from Rosenberg et al. (2021) are provided with MIT open source license at the following repository: https://github.com/markusmeister/Rosenberg-2021-Repository. |
| Dataset Splits | Yes | All evaluated algorithms were fit to multiple sub-datasets with different number of expert trajectories, and each dataset was analyzed using a 5-fold cross-validation. ... 20% of the trajectories from each dataset were held out as a test set. ... A 5-fold cross-validation was used to split the training and test dataset. |
| Hardware Specification | No | The paper describes experiments in a simulated gridworld environment and on real-world mice behavior datasets, but does not specify any particular hardware (e.g., GPU/CPU models) used for running these experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for libraries, frameworks, or environments used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The discount factor of the MDP was set to be γ = 0.9. ... The initial intention distribution Π was initialized uniformly, and the intention transition matrix Λ was initialized as: Λ = 0.95 I + N(0, 0.05 I) ... The discount factor fot this experiment were set to be γ = 0.99. |