ModelDiff: Symbolic Dynamic Programming for Model-Aware Policy Transfer in Deep Q-Learning
Authors: Xiaotian Liu, Jihwan Jeong, Ayal Taitler, Michael Gimelfarb, Scott Sanner
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that MD-DQN matches or outperforms existing TL methods and baselines in both positive and negative transfer settings. ... We evaluated Model Diff-informed MD-DQN (from here out, shortened to just MD-DQN) derived lower bound on three different domains in a transfer learning setting with source and target tasks in each. |
| Researcher Affiliation | Academia | 1University of Toronto, Toronto, ON, Canada 2Vector Institute for AI, Toronto, ON, Canada 3Ben-Gurion University of the Negev, Be er Sheva, Israel |
| Pseudocode | Yes | Algorithm 1: Model Diff DQN (MD-DQN ) |
| Open Source Code | No | The paper does not provide a specific link to source code or an explicit statement about its release in the main text or supplementary materials. |
| Open Datasets | No | We evaluate MD-DQN on three benchmark domains: POWERGEN, PICK-AND-PLACE, and RESERVOIR. ... All the domains and tasks have been implemented and trained in py RDDLGym (Taitler et al. 2022). |
| Dataset Splits | No | The paper describes reinforcement learning experiments within custom environments (POWERGEN, PICK-AND-PLACE, RESERVOIR) and does not mention explicit training/test/validation dataset splits, which are typically used for supervised learning tasks. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | All the domains and tasks have been implemented and trained in py RDDLGym (Taitler et al. 2022). (No version number specified for pyRDDLGym or other software dependencies.) |
| Experiment Setup | Yes | Each task is set with a horizon of 20, and the discount factor γ is fixed at 0.9. ... The modified DQN loss for a transition (s, a, s ), derived from LQπs t , is: 2 (ytarget Q(s, a; θ))2 , ytarget = max r(s, a) + γ max a Q(s , a ; θ ), Qπs t (s, a) ... Algorithm 1: Model Diff DQN (MD-DQN ) includes the epsilon-greedy action selection probability ϵ. |