Transfer Q-Learning with Composite MDP Structures

Authors: Jinhang Chai, Elynn Chen, Lin Yang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations. This work represents a significant step toward bridging the gap between empirical success of transfer RL and theoretical understanding by providing a rigorous analysis of how structural similarities in transition dynamics enable efficient knowledge transfer.
Researcher Affiliation Academia 1Department of Operations Research and Financial Engineering, Princeton University 2Department of Technology, Operations, and Statistics, New York University 3Department of Electrical and Computer Engineering, UCLA.
Pseudocode Yes Algorithm 1 UCB-Q Learning for HD Composite MDPs Algorithm 2 UCB-TQL for Composite MDPs
Open Source Code No The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention that code is available in supplementary materials.
Open Datasets No The paper is theoretical in nature, focusing on a composite MDP framework and algorithms. It does not conduct empirical studies using specific datasets, therefore, no information about publicly available or open datasets is provided.
Dataset Splits No The paper is theoretical and does not perform experiments with specific datasets. Consequently, there is no mention of training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on algorithm design and regret analysis rather than empirical experimentation. Therefore, no specific hardware used for running experiments is mentioned.
Software Dependencies No The paper is theoretical and does not describe the implementation of its algorithms or any experimental setup. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical, presenting a new MDP framework and associated algorithms with provable guarantees. It does not include any experimental validation, and as such, no specific experimental setup details, hyperparameters, or system-level training settings are provided.