Model-Free Representation Learning and Exploration in Low-Rank MDPs
Authors: Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we present the first model-free representation learning algorithms for low-rank MDPs. The key algorithmic contribution is a new minimax representation learning objective, for which we provide variants with differing tradeoffs in their statistical and computational properties. We interleave this representation learning step with an exploration strategy to cover the state space in a reward-free manner. The resulting algorithms are provably sample efficient and can accommodate general function approximation to scale to complex environments. Keywords: Reinforcement learning, representation learning, low-rank MDPs, sample complexity analysis, reward-free exploration |
| Researcher Affiliation | Collaboration | Aditya Modi EMAIL Microsoft Mountain View, CA 94043, USA Jinglin Chen EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011, USA Nan Jiang EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Alekh Agarwal EMAIL Google Research |
| Pseudocode | Yes | Algorithm 1 Moffle (R, Φ, ηmin, ε, δ): Model-Free Feature Learning and Exploration Algorithm 2 Explore (Φ, ηmin, δ) Algorithm 3 Feature Selection via Greedy Improvement Algorithm 4 Elliptical Planner with FQI and FQE Algorithm 5 FQI: Fitted Q-Iteration Algorithm 6 FQE: Fitted Q-Evaluation |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released or provide a link to a code repository. It mentions a license for the paper content itself and cites other work that empirically evaluates their oracle, but not their own code. |
| Open Datasets | No | The paper is theoretical and does not describe any experiments that use specific datasets. Therefore, it does not provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets. Thus, it does not provide any information regarding training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithms, objectives, and sample complexity analysis. It does not contain any experimental evaluation that would require a description of hardware specifications. |
| Software Dependencies | No | The paper is theoretical and primarily presents algorithms and their theoretical properties. It mentions that |
| Experiment Setup | No | The paper is theoretical, focusing on algorithmic design, objectives, and sample complexity analysis, rather than empirical evaluation. Therefore, it does not provide specific experimental setup details such as hyperparameter values or system-level training settings. |