reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transfer Q-Learning with Composite MDP Structures

Authors: Jinhang Chai, Elynn Chen, Lin Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations. This work represents a significant step toward bridging the gap between empirical success of transfer RL and theoretical understanding by providing a rigorous analysis of how structural similarities in transition dynamics enable efficient knowledge transfer.
Researcher Affiliation	Academia	1Department of Operations Research and Financial Engineering, Princeton University 2Department of Technology, Operations, and Statistics, New York University 3Department of Electrical and Computer Engineering, UCLA.
Pseudocode	Yes	Algorithm 1 UCB-Q Learning for HD Composite MDPs Algorithm 2 UCB-TQL for Composite MDPs
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention that code is available in supplementary materials.
Open Datasets	No	The paper is theoretical in nature, focusing on a composite MDP framework and algorithms. It does not conduct empirical studies using specific datasets, therefore, no information about publicly available or open datasets is provided.
Dataset Splits	No	The paper is theoretical and does not perform experiments with specific datasets. Consequently, there is no mention of training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and regret analysis rather than empirical experimentation. Therefore, no specific hardware used for running experiments is mentioned.
Software Dependencies	No	The paper is theoretical and does not describe the implementation of its algorithms or any experimental setup. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical, presenting a new MDP framework and associated algorithms with provable guarantees. It does not include any experimental validation, and as such, no specific experimental setup details, hyperparameters, or system-level training settings are provided.