reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Complete Characterization of Linear Estimators for Offline Policy Evaluation

Authors: Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Despite significant study, a sharp characterization of when we might expect offline policy evaluation to be tractable, even in the simplest setting of linear function approximation, has so far remained elusive... In this work, we identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods... to succeed... We prove that LSTD works under strictly weaker conditions than FQI. Furthermore, we establish that if a problem is not solvable via LSTD, then it cannot be solved by a broad class of linear estimators, even in the limit of infinite data. Taken together, our results provide a complete picture of the behavior of linear estimators for offline policy evaluation, unify previously disparate analyses of canonical algorithms, and provide significantly sharper notions of the underlying statistical complexity of offline policy evaluation.
Researcher Affiliation	Collaboration	Juan C. Perdomo EMAIL 537 Soda Hall, University of California, Berkeley 94709 Akshay Krishnamurthy EMAIL Microsoft Research Peter Bartlett EMAIL University of California, Berkeley Sham Kakade EMAIL Harvard University
Pseudocode	No	The paper describes FQI and LSTD algorithms in prose and mathematical notation (e.g., "Fitted Q-iteration iteratively solves least squares regression problems of the form...", "LSTD tries to approximate θ γ by computing the plug-in estimate to the closed-form solution..."), but it does not present any formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories.
Open Datasets	No	In the OPE problem, we are given a policy π and a dataset {(si, ai, ri(si, ai), s i, a i)}n i=1 of observed transitions and rewards, where the initial pair (si, ai) is sampled from some arbitrary distribution D... This refers to a generic dataset structure, not a specific, publicly available one.
Dataset Splits	No	The paper is theoretical and does not use any specific datasets for experimentation, therefore it does not discuss dataset splits.
Hardware Specification	No	The paper focuses on theoretical characterization and proofs and does not describe any experimental procedures or hardware used for computation.
Software Dependencies	No	The paper is theoretical and does not describe any implementation details or software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not present any experimental setup details, such as hyperparameter values or training configurations.