Transfer Q-Learning with Composite MDP Structures
Authors: Jinhang Chai, Elynn Chen, Lin Yang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations. This work represents a significant step toward bridging the gap between empirical success of transfer RL and theoretical understanding by providing a rigorous analysis of how structural similarities in transition dynamics enable efficient knowledge transfer. |
| Researcher Affiliation | Academia | 1Department of Operations Research and Financial Engineering, Princeton University 2Department of Technology, Operations, and Statistics, New York University 3Department of Electrical and Computer Engineering, UCLA. |
| Pseudocode | Yes | Algorithm 1 UCB-Q Learning for HD Composite MDPs Algorithm 2 UCB-TQL for Composite MDPs |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention that code is available in supplementary materials. |
| Open Datasets | No | The paper is theoretical in nature, focusing on a composite MDP framework and algorithms. It does not conduct empirical studies using specific datasets, therefore, no information about publicly available or open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not perform experiments with specific datasets. Consequently, there is no mention of training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and regret analysis rather than empirical experimentation. Therefore, no specific hardware used for running experiments is mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe the implementation of its algorithms or any experimental setup. Therefore, no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical, presenting a new MDP framework and associated algorithms with provable guarantees. It does not include any experimental validation, and as such, no specific experimental setup details, hyperparameters, or system-level training settings are provided. |