reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-intention Inverse Q-learning for Interpretable Behavior Representation

Authors: Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions.
Researcher Affiliation	Academia	Hao Zhu EMAIL Department of Computer Science & IMBIT//Brain Links-Brain Tools, University of Freiburg
Pseudocode	Yes	Algorithm 1 Inverse action-value iteration. given expert demonstrations D. Algorithm 2 Inverse Q-learning. given expert demonstrations D; learning rate αr, αQ, and αSh. Algorithm 3 Hierarchical inverse Q-learning. given expert demonstrations D and reward set cardinality K.
Open Source Code	Yes	Implementation for the class of HIQL algorithms can be found at https://github.com/haozhu10015/hiql.
Open Datasets	Yes	The expert demonstrations in this real-world dataset were originally collected by Rosenberg et al. (2021) from a 127-node labyrinth navigation task. ... The original recorded animal trajectories from Rosenberg et al. (2021) are provided with MIT open source license at the following repository: https://github.com/markusmeister/Rosenberg-2021-Repository.
Dataset Splits	Yes	All evaluated algorithms were fit to multiple sub-datasets with different number of expert trajectories, and each dataset was analyzed using a 5-fold cross-validation. ... 20% of the trajectories from each dataset were held out as a test set. ... A 5-fold cross-validation was used to split the training and test dataset.
Hardware Specification	No	The paper describes experiments in a simulated gridworld environment and on real-world mice behavior datasets, but does not specify any particular hardware (e.g., GPU/CPU models) used for running these experiments or simulations.
Software Dependencies	No	The paper does not provide specific software names with version numbers for libraries, frameworks, or environments used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The discount factor of the MDP was set to be γ = 0.9. ... The initial intention distribution Π was initialized uniformly, and the intention transition matrix Λ was initialized as: Λ = 0.95 I + N(0, 0.05 I) ... The discount factor fot this experiment were set to be γ = 0.99.