reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ModelDiff: Symbolic Dynamic Programming for Model-Aware Policy Transfer in Deep Q-Learning

Authors: Xiaotian Liu, Jihwan Jeong, Ayal Taitler, Michael Gimelfarb, Scott Sanner

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that MD-DQN matches or outperforms existing TL methods and baselines in both positive and negative transfer settings. ... We evaluated Model Diff-informed MD-DQN (from here out, shortened to just MD-DQN) derived lower bound on three different domains in a transfer learning setting with source and target tasks in each.
Researcher Affiliation	Academia	1University of Toronto, Toronto, ON, Canada 2Vector Institute for AI, Toronto, ON, Canada 3Ben-Gurion University of the Negev, Be er Sheva, Israel
Pseudocode	Yes	Algorithm 1: Model Diff DQN (MD-DQN )
Open Source Code	No	The paper does not provide a specific link to source code or an explicit statement about its release in the main text or supplementary materials.
Open Datasets	No	We evaluate MD-DQN on three benchmark domains: POWERGEN, PICK-AND-PLACE, and RESERVOIR. ... All the domains and tasks have been implemented and trained in py RDDLGym (Taitler et al. 2022).
Dataset Splits	No	The paper describes reinforcement learning experiments within custom environments (POWERGEN, PICK-AND-PLACE, RESERVOIR) and does not mention explicit training/test/validation dataset splits, which are typically used for supervised learning tasks.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies	No	All the domains and tasks have been implemented and trained in py RDDLGym (Taitler et al. 2022). (No version number specified for pyRDDLGym or other software dependencies.)
Experiment Setup	Yes	Each task is set with a horizon of 20, and the discount factor γ is fixed at 0.9. ... The modified DQN loss for a transition (s, a, s ), derived from LQπs t , is: 2 (ytarget Q(s, a; θ))2 , ytarget = max r(s, a) + γ max a Q(s , a ; θ ), Qπs t (s, a) ... Algorithm 1: Model Diff DQN (MD-DQN ) includes the epsilon-greedy action selection probability ϵ.