Kernel-Matrix Determinant Estimates from stopped Cholesky Decomposition

Authors: Simon Bartels, Wouter Boomsma, Jes Frellsen, Damien Garreau

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that this can save a considerable amount of time while rarely exceeding an overhead of more than 5% when not stopping early. More generally, we present a probabilistic stopping strategy for the approximation of a sum of known length where addends are revealed sequentially. We do not assume independence between addends, only that they are bounded from below and decrease in conditional expectation.
Researcher Affiliation Academia Simon Bartels EMAIL University of Copenhagen Universitetsparken 1 2100 København, Denmark Wouter Boomsma EMAIL University of Copenhagen Universitetsparken 1 2100 København, Denmark Jes Frellsen EMAIL Technical University of Denmark Richard Petersens Plads 2800 Kgs. Lyngby, Denmark Damien Garreau EMAIL Universit e Cˆote d Azur, Inria, CNRS, LJAD Parc Valrose 06108 Nice Cedex 2, France
Pseudocode Yes Algorithm 1 Augmented row-wise Cholesky decomposition with optional stopping. Highlighted are our modifications to the original algorithm. See Algorithm 3 on page 19 for a practical implementation. ... Algorithm 2 Evaluate Conditions And Estimator. At a given step, this routine computes the lower and upper bounds, and proceeds to check if they are close enough. ... Algorithm 3 Blocked and recursive formulation of Algorithm 1. Highlighted are our modifications to the original algorithm.
Open Source Code Yes To make this idea practical, we modified the Open BLAS (Wang et al., 2013) implementation and made our code1 available. 1. https://github.com/Simon Bartels/pac_kernel_matrix_determinant_estimation
Open Datasets Yes From the UCI machine learning repository (Dua and Graff, 2019), we took all multivariate datasets in matrix format with 40000 to 50000 instances without missing values. Furthermore, we included the frequently used PUMADYN dataset (Snelson and Ghahramani, 2006) as a small-scale example of only 8000 instances. ... Table 1 provides an overview of all the datasets that we use. ... The URL is a suffix for http://archive.ics.uci.edu/ml/datasets/. The reference in Source acknowledges a citation request, if any.
Dataset Splits No The paper mentions 'We repeated each configuration for ten random permutations of the dataset.' However, it does not provide specific percentages or absolute counts for training, validation, or test splits. The 'random permutations' refer to the order of processing data points for evaluation, not distinct data partitions for model development.
Hardware Specification Yes All large-scale experiments ( 40000 datapoints) were executed on machines running Ubuntu 18.04 with 32 Gigabytes of RAM and two Intel Xeon E5-2670 v2 CPUs. The experiments for the PUMADYN dataset were run on a laptop running Ubuntu 20.04 with 16 Gigabytes of RAM and an Intel i7-8665U CPU, to demonstrate the usefulness of our approach on more standard hardware.
Software Dependencies No The paper mentions 'Open BLAS' and 'Sci Py method minimize scalar', but does not provide specific version numbers for these software components. It only mentions the operating system versions 'Ubuntu 18.04' and 'Ubuntu 20.04'.
Experiment Setup Yes We set σ2 := 0.001 and θ := 1, and increased the lengthscale as ℓ:= exp(i) for i = 1, . . . , 3. In the appendix we report results where we set ℓ:= 1 and θ := 1, and vary the observation noise as σ2 := exp(i) for i = 4, . . . , 0. The Cholesky decomposition with full pivoting takes as input parameter a desired relative error on the diagonal elements d (instead of a relative error on the log-determinant, see Eq. (15)). We ran this algorithm for d {0.001, 0.005, 0.01, 0.05, 0.1, 0.5}. After the pivoted Cholesky stopped, we computed the relative error on the log-determinant that this algorithm could guarantee in that step. Then we ran our algorithm trying to achieve the same error for δ = 0.1.