reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Predictive Information Accelerates Learning in RL

Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that PI-SAC agents can substantially improve sample efﬁciency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels.
Researcher Affiliation	Collaboration	Kuang-Huei Lee Google Research EMAIL Ian Fischer Google Research EMAIL Anthony Z. Liu University of Michigan EMAIL Yijie Guo University of Michigan EMAIL Honglak Lee Google Research EMAIL John Canny Google Research EMAIL Sergio Guadarrama Google Research EMAIL
Pseudocode	Yes	Algorithm 1 Training Algorithm for PI-SAC
Open Source Code	Yes	Our implementation is given on Git Hub.1 https://github.com/google-research/pisac
Open Datasets	Yes	We evaluate PI-SAC on the Deep Mind control suite [42] and compare with leading model-free and model-based approaches for continuous control from pixels
Dataset Splits	No	The paper uses continuous control environments and does not specify traditional train/validation/test dataset splits with percentages or counts, as is common in supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using standard hyperparameters and architectures similar to other works, but does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Throughout these experiments we mostly use the standard SAC hyperparameters [16], including the sizes of the actor and critic networks, learning rates, and target critic update rate. Unless otherwise speciﬁed, we set CEB β = 0.01. We report our results with the best number of gradient updates per environment step in Section 4.1, and use one gradient update per environment step for the rest of the experiments. Full details of hyperparameters are listed in Section A.2.