reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Scalable Deep Kernels with Recurrent Structure

Authors: Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP-LSTM are uniquely valuable. [...] and present an extensive empirical evaluation of our model. Specifically, we apply our model to a number of tasks, including system identification, energy forecasting, and self-driving car applications. Quantitatively, the model is assessed on the data ranging in size from hundreds of points to almost a million with various signal-to-noise ratios demonstrating state-of-the-art performance and linear scaling of our approach.
Researcher Affiliation	Academia	Maruan Al-Shedivat EMAIL Carnegie Mellon University Andrew Gordon Wilson EMAIL Cornell University Yunus Saatchi EMAIL Zhiting Hu EMAIL Carnegie Mellon University Eric P. Xing EMAIL Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Semi-stochastic alternating gradient descent. [...] Algorithm 2 Semi-stochastic asynchronous gradient descent.
Open Source Code	Yes	We release our code as a library at: http://github.com/alshedivat/keras-gp. This library implements the ideas in this paper as well as deep kernel learning (Wilson et al., 2016a) via a Gaussian process layer that can be added to arbitrary deep architectures and deep learning frameworks, following the Keras API specification.
Open Datasets	Yes	In the first set of experiments, we used publicly available nonlinear system identification datasets: Actuator6 (Sjöberg et al., 1995) and Drives7 (Wigren, 2010). [...] The smart grid data were taken from Global Energy Forecasting Kaggle competitions organized in 2012. [...] The dataset is proprietary. It was released in part for public use under the Creative Commons Attribution 3.0 license: http://archive.org/details/comma-dataset.
Dataset Splits	Yes	For the smart grid prediction tasks we used LSTM and GP-LSTM models with 48 hour time lags and were predicting the target values one hour ahead. LSTM and GP-LSTM were trained with one or two layers and 32 to 256 hidden units. The best models were selected on 25% of the training data used for validation. For autonomous driving prediction tasks, we used the same architectures but with 128 time steps of lag (1.28 s). [...] We considered the data from the first trip for training and from the second trip for validation and testing.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experiments. The paper discusses scalability in terms of time per epoch and time per test point, but not the underlying hardware.
Software Dependencies	No	Recurrent parts of each model were implemented using Keras11 library. We extended Keras with the GP layer and developed a backed engine based on the GPML library12. Our approach allows us to take full advantage of the functionality available in Keras and GPML, e.g., use automatic differentiation for the recurrent part of the model. Our code is available at http://github.com/alshedivat/keras-gp/. 11. http://www.keras.io 12. http://www.gaussianprocess.org/gpml/code/matlab/doc/ While Keras and GPML are mentioned, specific version numbers are not provided, which is necessary for a reproducible description of software dependencies.
Experiment Setup	Yes	For both smart grid prediction tasks we used LSTM and GP-LSTM models with 48 hour time lags and were predicting the target values one hour ahead. LSTM and GP-LSTM were trained with one or two layers and 32 to 256 hidden units. The best models were selected on 25% of the training data used for validation. For autonomous driving prediction tasks, we used the same architectures but with 128 time steps of lag (1.28 s). All models were regularized with dropout (Srivastava et al., 2014; Gal and Ghahramani, 2016b). [...] The LSTM architecture was the same as described in the previous section: it was transforming multi-dimensional sequences of inputs to a two-dimensional representation. We trained the model for 10 epochs on 10%, 20%, 40%, and 80% of the training set with 100, 200, and 400 inducing points per dimension and measured the average training time per epoch and the average prediction time per testing point. Table 5: Summary of the feedforward and recurrent neural architectures and the corresponding hyperparameters used in the experiments. GP-based models used the same architectures as their non-GP counterparts. Activations are given for the hidden units; vanilla neural nets used linear output activations.