reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model-Free Representation Learning and Exploration in Low-Rank MDPs

Authors: Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we present the ﬁrst model-free representation learning algorithms for low-rank MDPs. The key algorithmic contribution is a new minimax representation learning objective, for which we provide variants with diﬀering tradeoﬀs in their statistical and computational properties. We interleave this representation learning step with an exploration strategy to cover the state space in a reward-free manner. The resulting algorithms are provably sample eﬃcient and can accommodate general function approximation to scale to complex environments. Keywords: Reinforcement learning, representation learning, low-rank MDPs, sample complexity analysis, reward-free exploration
Researcher Affiliation	Collaboration	Aditya Modi EMAIL Microsoft Mountain View, CA 94043, USA Jinglin Chen EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011, USA Nan Jiang EMAIL Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA Alekh Agarwal EMAIL Google Research
Pseudocode	Yes	Algorithm 1 Moffle (R, Φ, ηmin, ε, δ): Model-Free Feature Learning and Exploration Algorithm 2 Explore (Φ, ηmin, δ) Algorithm 3 Feature Selection via Greedy Improvement Algorithm 4 Elliptical Planner with FQI and FQE Algorithm 5 FQI: Fitted Q-Iteration Algorithm 6 FQE: Fitted Q-Evaluation
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is being released or provide a link to a code repository. It mentions a license for the paper content itself and cites other work that empirically evaluates their oracle, but not their own code.
Open Datasets	No	The paper is theoretical and does not describe any experiments that use specific datasets. Therefore, it does not provide access information for any publicly available or open datasets.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with datasets. Thus, it does not provide any information regarding training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithms, objectives, and sample complexity analysis. It does not contain any experimental evaluation that would require a description of hardware specifications.
Software Dependencies	No	The paper is theoretical and primarily presents algorithms and their theoretical properties. It mentions that
Experiment Setup	No	The paper is theoretical, focusing on algorithmic design, objectives, and sample complexity analysis, rather than empirical evaluation. Therefore, it does not provide specific experimental setup details such as hyperparameter values or system-level training settings.