An empirical study of implicit regularization in deep offline RL

Authors: Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, Arnaud Doucet

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and Deep Mind lab.
Researcher Affiliation Industry Caglar Gulcehre , Srivatsan Srinivasan , Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet Deep Mind
Pseudocode Yes Here, we present the Python code-stub that we used across our experiments (similar to Kumar et al. (2020a)) to compute the feature ranks of the pre-output layer s features:import numpy as np def compute_rank_from_features(feature_matrix, rank_delta=0.01): """Computes rank of the features based on how many singular values are significant.""" sing_values = np.linalg.svd(feature_matrix, compute_uv=False) cumsum = np.cumsum(sing_values) nuclear_norm = np.sum(sing_values) approximate_rank_threshold = 1.0 rank_delta threshold_crossed = ( cumsum >= approximate_rank_threshold * nuclear_norm) effective_rank = sing_values.shape[0] np.sum(threshold_crossed) + 1 return effective_rank
Open Source Code No No explicit statement or link to the authors' own source code for the methodology described in the paper is provided. The acknowledgements mention third-party tools used for the software infrastructure.
Open Datasets Yes To test and verify different aspects of the rank collapse hypothesis and its potential impact on the agent performance, we ran a large number of experiments on bsuite (Osband et al., 2019), Atari (Bellemare et al., 2013) and Deep Mind lab (Beattie et al., 2016) environments.
Dataset Splits Yes In all these experiments, we use the experimental protocol, datasets and hyperparameters from Gulcehre et al. (2020) unless stated otherwise. We uniformly sampled different proportions (from 5% of the transitions to the entire dataset) from the transitions in the RL Unplugged Atari benchmark dataset (Gulcehre et al., 2020) to understand how the agent behaves with different amounts of training data and whether this is a factor affecting the rank of the network.
Hardware Specification No The paper does not provide specific hardware details (such as GPU or CPU models, memory, or detailed cloud instance types) used for running the experiments. It only mentions using the DeepMind JAX ecosystem for software infrastructure.
Software Dependencies No The paper mentions using 'Acme (Hoffman et al., 2020), Jax (Bradbury et al., 2018) and the Deepmind JAX ecosystem (Babuschkin et al., 2020)' for software infrastructure, but does not provide specific version numbers for these or any other key software dependencies.
Experiment Setup Yes In all these experiments, we use the experimental protocol, datasets and hyperparameters from Gulcehre et al. (2020) unless stated otherwise. We provide the details of architectures and their default hyperparameters in Appendix A.11. Table 3: The default hyper-parameters used in our work across different domains. Training batch size (bsuite 32, Atari 256, Deep Mind lab 4 (episodes)), Rank calculation batch size (512), Num training steps (bsuite 1e6, Atari 2e6, Deep Mind lab 2e4), Learning rate (bsuite 3e-4, Atari 3e-5, Deep Mind lab 1e-3), Optimizer (Adam), Feedforward hidden layer size (bsuite 64, Atari 512, Deep Mind lab 256), Num hidden layers (bsuite 2, Atari 1, Deep Mind lab 1), Activation (bsuite ReLU, Atari ReLU, Deep Mind lab ReLU (tanh for LSTM gates)), Memory (bsuite None, Atari None, Deep Mind lab LSTM), Discount (0.99).