reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Instrumental Variable Regression for Deep Offline Policy Evaluation

Authors: Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of these techniques empirically on a variety of tasks and environments, including Behaviour Suite (BSuite) (Osband et al., 2019) and Deep Mind Control Suite (DM Control) (Tassa et al., 2020). We found experimentally that some of the recent IV techniques such as AGMM display performance on par with state-of-the-art FQE methods.
Researcher Affiliation	Collaboration	Yutian Chen EMAIL Deep Mind R7, 14-18 Handyside Street King s Cross London N1C 4DN Liyuan Xu EMAIL Gatsby Unit Caglar Gulcehre EMAIL Deep Mind Tom Le Paine EMAIL Deep Mind Arthur Gretton EMAIL Gatsby Unit Nando de Freitas EMAIL Deep Mind Arnaud Doucet EMAIL Deep Mind
Pseudocode	No	The paper describes various algorithms and methods (e.g., LSTD, Deep IV, KIV, DFIV, GMM, AGMM, ASEM) through mathematical formulations and prose, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We open-source all our code and datasets at https://github.com/liyuan9988/IVOPEwith ACME
Open Datasets	Yes	We open-source all our code and datasets at https://github.com/liyuan9988/IVOPEwith ACME We consider a list reinforcement learning environments from two widely used task collections: Behaviour Suite (BSuite) (Osband et al., 2019) and Deep Mind Control Suite (DM Control) (Tassa et al., 2020).
Dataset Splits	Yes	The dataset is then split randomly into training and validation subsets with a ratio of 9:1. Table 1: BSuite tasks. ... The training and validation data ratio is 9:1. Table 2: DM Control Suite tasks. ... The training and validation data ratio is 9:1.
Hardware Specification	No	The paper does not provide specific hardware details (like CPU or GPU models, or cloud computing instance types) used for running the experiments. It focuses on the software architecture and hyperparameters.
Software Dependencies	No	The paper mentions the use of the ACME library (Hoﬀman et al., 2020), OAdam, and Adam optimizers, but it does not specify version numbers for these or any other software dependencies, such as programming languages or deep learning frameworks.
Experiment Setup	Yes	We compare a list of representative non-linear IV methods, including Kernel IV (KIV), Deep IV, Deep Feature IV (DFIV) and three adversarial IV methods: Deep GMM, Adversarial GMM Networks (AGMM), Adversarial approach to structural equation models (ASEM). We also include as baselines the deterministic Bellman residual minimization (DBRM) and two variants of the ﬁtted Q evaluation methods with a deterministic (FQE) and distributional (DFQE) Q representation respectively. ... All algorithms except KIV use the same network architecture to estimate the Q function as in the trained agent for a fair comparison. For BSuite tasks, the Q network is an MLP with layer size 50-50-1 and Re LU activation. The input is a concatenation of the ﬂattened observation and one-hot encoding of the discrete action variable. For DM Control tasks, it is an MLP with layer size 512-512-256-1, ELU activation and a layer normalization after the ﬁrst hidden layer. ... We use OAdam for adversarial methods as suggested by Bennett et al. (2019b); Dikkala et al. (2020) and Adam for other methods. ... We run a thorough hyper-parameter search for every algorithm in every environment. We randomly sample up to 100 hyper-parameter settings for every algorithm and choose the setting with the best metric on a held-out validation dataset. ... Appendix B. Hyper-parameter selection.