reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KernelMatmul: Scaling Gaussian Processes to Large Time Series

Authors: Tilman Hoffbauer, Holger H. Hoos, Jakob Bossek

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We thoroughly benchmark our new method against multiple baselines to demonstrate its benefits and limitations, both in efficiency and accuracy. Our work includes a benchmark of multiple approaches to kernel matrix multiplication performance (including Ke Ops (Charlier et al. 2021)). This benchmark demonstrates improved performance of Kernel Matmul on many configurations (Section 5.1). An investigation of the residuals (Section 5.2) showcases the low error introduced by our sparsity approximation. Additionally, we provide a comparison (Section 5.3) to other approximation schemes, such as variational inference (Wu, Pleiss, and Cunningham 2022) and structured kernel interpolation (Wilson and Nickisch 2015).
Researcher Affiliation	Academia	1Chair for AI Methodology, RWTH Aachen University, Germany 2Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands 3University of British Columbia, Canada 4Chair for Machine Learning and Optimisation, Paderborn University, Germany
Pseudocode	No	The paper describes mathematical formulas and implementation details but does not contain a clearly labeled pseudocode block or algorithm section.
Open Source Code	Yes	Implementation https://github.com/Turakar/kernel-matmul Experiments https://github.com/Turakar/kernel-matmul-benchmark
Open Datasets	Yes	It was executed on three datasets from the Monash Forecasting Repository: London Smart Meters, Solar and Traffic (Godahewa et al. 2021).
Dataset Splits	Yes	We use the splits into training, validation and test data provided by the the Monash Forecasting Repository.
Hardware Specification	Yes	All experiments were run on an NVIDIA H100 GPU with CUDA 12.1 on Linux.
Software Dependencies	Yes	All experiments were run on an NVIDIA H100 GPU with CUDA 12.1 on Linux.
Experiment Setup	Yes	For each dataset, we compiled a list of 11 randomly selected subsets of 20 series each. Then, we performed individual hyperparameter optimizations (HPOs) for each subset and method. Every HPO is performed with SMAC3 (Lindauer et al. 2022) in a multi-instance setting for 4 hours wall time. The best configuration of each HPO is then evaluated on all series in the test dataset by retraining on the training and validation data of that series. ... Kernel Matmul-based CG inference (ϵ = 10 5) ... For this dataset, we constructed a perfectly matching spectral kernel (i.e., ν = 1) with α = 0.01. ... The predictions for this second comparison were generated with GPy Torch default settings, i.e., a CG tolerance of 0.01. ... SKI used as many grid points as there are samples in the training set, while VNNGP used 64 neighbours per sample.