reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An empirical study of implicit regularization in deep offline RL

Authors: Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, Arnaud Doucet

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we conduct a careful empirical study on the relation between eﬀective rank and performance on three oﬄine RL datasets : bsuite, Atari, and Deep Mind lab.
Researcher Affiliation	Industry	Caglar Gulcehre , Srivatsan Srinivasan , Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoﬀman, Razvan Pascanu, Arnaud Doucet Deep Mind
Pseudocode	Yes	Here, we present the Python code-stub that we used across our experiments (similar to Kumar et al. (2020a)) to compute the feature ranks of the pre-output layer s features:import numpy as np def compute_rank_from_features(feature_matrix, rank_delta=0.01): """Computes rank of the features based on how many singular values are significant.""" sing_values = np.linalg.svd(feature_matrix, compute_uv=False) cumsum = np.cumsum(sing_values) nuclear_norm = np.sum(sing_values) approximate_rank_threshold = 1.0 rank_delta threshold_crossed = ( cumsum >= approximate_rank_threshold * nuclear_norm) effective_rank = sing_values.shape[0] np.sum(threshold_crossed) + 1 return effective_rank
Open Source Code	No	No explicit statement or link to the authors' own source code for the methodology described in the paper is provided. The acknowledgements mention third-party tools used for the software infrastructure.
Open Datasets	Yes	To test and verify diﬀerent aspects of the rank collapse hypothesis and its potential impact on the agent performance, we ran a large number of experiments on bsuite (Osband et al., 2019), Atari (Bellemare et al., 2013) and Deep Mind lab (Beattie et al., 2016) environments.
Dataset Splits	Yes	In all these experiments, we use the experimental protocol, datasets and hyperparameters from Gulcehre et al. (2020) unless stated otherwise. We uniformly sampled diﬀerent proportions (from 5% of the transitions to the entire dataset) from the transitions in the RL Unplugged Atari benchmark dataset (Gulcehre et al., 2020) to understand how the agent behaves with diﬀerent amounts of training data and whether this is a factor aﬀecting the rank of the network.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU or CPU models, memory, or detailed cloud instance types) used for running the experiments. It only mentions using the DeepMind JAX ecosystem for software infrastructure.
Software Dependencies	No	The paper mentions using 'Acme (Hoﬀman et al., 2020), Jax (Bradbury et al., 2018) and the Deepmind JAX ecosystem (Babuschkin et al., 2020)' for software infrastructure, but does not provide specific version numbers for these or any other key software dependencies.
Experiment Setup	Yes	In all these experiments, we use the experimental protocol, datasets and hyperparameters from Gulcehre et al. (2020) unless stated otherwise. We provide the details of architectures and their default hyperparameters in Appendix A.11. Table 3: The default hyper-parameters used in our work across diﬀerent domains. Training batch size (bsuite 32, Atari 256, Deep Mind lab 4 (episodes)), Rank calculation batch size (512), Num training steps (bsuite 1e6, Atari 2e6, Deep Mind lab 2e4), Learning rate (bsuite 3e-4, Atari 3e-5, Deep Mind lab 1e-3), Optimizer (Adam), Feedforward hidden layer size (bsuite 64, Atari 512, Deep Mind lab 256), Num hidden layers (bsuite 2, Atari 1, Deep Mind lab 1), Activation (bsuite ReLU, Atari ReLU, Deep Mind lab ReLU (tanh for LSTM gates)), Memory (bsuite None, Atari None, Deep Mind lab LSTM), Discount (0.99).