Predictive Information Accelerates Learning in RL

Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama

NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels.
Researcher Affiliation Collaboration Kuang-Huei Lee Google Research EMAIL Ian Fischer Google Research EMAIL Anthony Z. Liu University of Michigan EMAIL Yijie Guo University of Michigan EMAIL Honglak Lee Google Research EMAIL John Canny Google Research EMAIL Sergio Guadarrama Google Research EMAIL
Pseudocode Yes Algorithm 1 Training Algorithm for PI-SAC
Open Source Code Yes Our implementation is given on Git Hub.1 https://github.com/google-research/pisac
Open Datasets Yes We evaluate PI-SAC on the Deep Mind control suite [42] and compare with leading model-free and model-based approaches for continuous control from pixels
Dataset Splits No The paper uses continuous control environments and does not specify traditional train/validation/test dataset splits with percentages or counts, as is common in supervised learning.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using standard hyperparameters and architectures similar to other works, but does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes Throughout these experiments we mostly use the standard SAC hyperparameters [16], including the sizes of the actor and critic networks, learning rates, and target critic update rate. Unless otherwise specified, we set CEB β = 0.01. We report our results with the best number of gradient updates per environment step in Section 4.1, and use one gradient update per environment step for the rest of the experiments. Full details of hyperparameters are listed in Section A.2.