Model-Free Representation Learning and Exploration in Low-Rank MDPs

Authors: Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we present the first model-free representation learning algorithms for low-rank MDPs. The key algorithmic contribution is a new minimax representation learning objective, for which we provide variants with differing tradeoffs in their statistical and computational properties. We interleave this representation learning step with an exploration strategy to cover the state space in a reward-free manner. The resulting algorithms are provably sample efficient and can accommodate general function approximation to scale to complex environments. Keywords: Reinforcement learning, representation learning, low-rank MDPs, sample complexity analysis, reward-free exploration
Researcher Affiliation Collaboration Aditya Modi EMAIL Microsoft Mountain View, CA 94043, USA Jinglin Chen EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011, USA Nan Jiang EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Alekh Agarwal EMAIL Google Research
Pseudocode Yes Algorithm 1 Moffle (R, Φ, ηmin, ε, δ): Model-Free Feature Learning and Exploration Algorithm 2 Explore (Φ, ηmin, δ) Algorithm 3 Feature Selection via Greedy Improvement Algorithm 4 Elliptical Planner with FQI and FQE Algorithm 5 FQI: Fitted Q-Iteration Algorithm 6 FQE: Fitted Q-Evaluation
Open Source Code No The paper does not explicitly state that source code for the described methodology is being released or provide a link to a code repository. It mentions a license for the paper content itself and cites other work that empirically evaluates their oracle, but not their own code.
Open Datasets No The paper is theoretical and does not describe any experiments that use specific datasets. Therefore, it does not provide access information for any publicly available or open datasets.
Dataset Splits No The paper is theoretical and does not conduct experiments with datasets. Thus, it does not provide any information regarding training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on algorithms, objectives, and sample complexity analysis. It does not contain any experimental evaluation that would require a description of hardware specifications.
Software Dependencies No The paper is theoretical and primarily presents algorithms and their theoretical properties. It mentions that
Experiment Setup No The paper is theoretical, focusing on algorithmic design, objectives, and sample complexity analysis, rather than empirical evaluation. Therefore, it does not provide specific experimental setup details such as hyperparameter values or system-level training settings.