Scalable Gaussian Processes with Latent Kronecker Structure
Authors: Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, David Eriksson, José Miguel Hernández-Lobato, Eytan Bakshy
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct three distinct experiments to empirically evaluate LKGPs: inverse dynamics prediction for robotics, learning curve prediction in an Auto ML setting, and prediction of missing values in spatiotemporal climate data. In the first experiment, we compare LKGP against iterative methods without latent Kronecker structure. In the second and third experiment, we compare our method to various sparse and variational methods. |
| Researcher Affiliation | Collaboration | 1Meta 2University of Cambridge 3Max Planck Institute for Intelligent Systems, Tübingen. |
| Pseudocode | No | The paper describes methods textually and mathematically but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All methods are implemented using GPy Torch (Gardner et al., 2018a), and the source code is available at https://github. com/jandylin/Latent-Kronecker-GPs. |
| Open Datasets | Yes | The SARCOS dataset is publicly available.1 1http://gaussianprocess.org/gpml/data/ (...) We use data from LCBench (Zimmer et al., 2021).2 2https://github.com/automl/LCBench available under the Apache License, Version 2.0 (...) We used the Nordic Gridded Climate Dataset (Tveito et al., 2000; 2005).3 3https://cds.climate.copernicus.eu/ datasets/insitu-gridded-observations-nordic available under the License to Use Copernicus Products |
| Dataset Splits | Yes | Inverse Dynamics Prediction: We select subsets of the training data split with p = 5000 joint positions, velocities, and accelerations, and their corresponding q = 7 joint torques, and introduced 10%, 20%, ..., 90% missing values uniformly at random, resulting in n 35k. Learning Curve Prediction: Out of the p = 2000 curves per dataset, 10% were provided as fully observed during training and the remaining 90% were partially observed. The stopping point was chosen uniformly at random. Climate Data with Missing Values: Missing values were selected uniformly at random with ratios of 10%, 20%, ..., 50%. |
| Hardware Specification | Yes | All experiments were conducted on A100 GPUs with a total compute time of around 2000 hours. |
| Software Dependencies | No | All methods are implemented using GPy Torch (Gardner et al., 2018a)... For all our experiments, we start with a gridded dataset and introduce missing values which are withheld during training and used as test data... For all methods, observation noise and kernel hyperparameters were initialized with GPy Torch default values, and optimized using Adam. No specific version numbers for GPyTorch or other libraries are provided. |
| Experiment Setup | Yes | Inverse Dynamics Prediction: For both methods, observation noise and kernel hyperparameters were initialized with GPy Torch default values, and inferred by using Adam with a learning rate of 0.1 to maximize the marginal likelihood for 50 iterations. Additionally, both methods use conjugate gradients with a relative residual norm tolerance of 0.01 as iterative linear system solver. Learning Curve Prediction: LKGP was trained for 100 iterations using a learning rate of 0.1, conjugate gradients with a relative residual norm tolerance of 0.01 and a pivoted Cholesky preconditioner of rank 100. SVGP was trained for 30 epochs using a learning rate of 0.01, a batch size of 1000, and 10000 inducing points, which were initialized at random training data examples. VNNGP was trained for 1000 epochs using a learning rate of 0.01, a batch size of 1000, inducing points placed at every training example, and 256 nearest neighbors. Ca GP was trained for 1000 epochs using a learning rate of 0.1 and 512 actions. Climate Data with Missing Values: LKGP was trained for 100 iterations using a learning rate of 0.1, and conjugate gradients with a relative residual norm tolerance of 0.01 and a pivoted Cholesky preconditioner of rank 100. SVGP was trained for 5 epochs using a learning rate of 0.001, a batch size of 1000, and 10000 inducing points, which were initialized at random training data examples. VNNGP was trained for 50 epochs using a learning rate of 0.001, a batch size of 1000, 500000 inducing points placed at random training data examples, and 256 nearest neighbors. Ca GP was trained for 50 epochs using a learning rate of 0.1 and 256 actions. |