reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regularized Policy Iteration with Nonparametric Function Spaces

Authors: Amir-massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We analyze the statistical properties of REG-LSPI and provide an upper bound on the policy evaluation error and the performance loss of the policy returned by this method. Our bound shows the dependence of the loss on the number of samples, the capacity of the function space, and some intrinsic properties of the underlying Markov Decision Process. The dependence of the policy evaluation bound on the number of samples is minimax optimal. This is the ﬁrst work that provides such a strong guarantee for a nonparametric approximate policy iteration algorithm.1
Researcher Affiliation	Collaboration	Amir-massoud Farahmand EMAIL Mitsubishi Electric Research Laboratories (MERL) 201 Broadway, 8th Floor Cambridge, MA 02139, USA Mohammad Ghavamzadeh EMAIL Adobe Research 321 Park Avenue San Jose, CA 95110, USA Csaba Szepesv ari EMAIL Department of Computing Science University of Alberta Edmonton, AB, T6G 2E8, Canada Shie Mannor EMAIL Department of Electrical Engineering The Technion Haifa 32000, Israel
Pseudocode	Yes	Algorithm 1 Regularized Policy Iteration(K, ˆQ( 1),F\|A\|,J,{(λ(k) Q,n, λ(k) h,n)}K 1 k=0 ) // K: Number of iterations // ˆQ( 1): Initial action-value function // F\|A\|: The action-value function space // J: The regularizer // {(λ(k) Q,n, λ(k) h,n)}K k=0: The regularization coeﬃcients for k = 0 to K 1 do πk( ) ˆπ( ; ˆQ(k 1)) Generate training samples D(k) n ˆQ(k) REG-LSTD/BRM(πk, D(k) n ; F\|A\|, J, λ(k) Q,n, λ(k) h,n) end for return ˆQ(K 1) and πK( ) = ˆπ( ; ˆQ(K 1))
Open Source Code	No	The paper does not provide concrete access to source code. It discusses the importance of designing efficient implementations for future work, stating: "Designing scalable optimization algorithms for REG-LSPI/BRM is a topic for future work."
Open Datasets	No	The paper is theoretical and focuses on algorithm design and statistical properties. It introduces concepts like "a batch of data Dn" for theoretical analysis of an "offline learning scenario" (Section 2.3) but does not refer to any specific, publicly available dataset used for experiments or provide any links or citations for data access.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments with specific datasets. Therefore, it does not provide any training/test/validation dataset splits. It discusses theoretical sampling assumptions like "samples Xi and Xi+1 may be sampled independently (we call this the Planning scenario )" but this is not about practical data splitting for reproduction.
Hardware Specification	No	The paper is theoretical and does not describe any experimental hardware used for running simulations or computations. There are no mentions of specific GPU/CPU models, processors, or computing environments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies or version numbers needed to replicate potential experimental results. It mentions
Experiment Setup	No	The paper is theoretical and describes algorithms and their statistical properties. It does not provide specific details about experimental setup, hyperparameters, optimizer settings, or training configurations for empirical evaluation.