reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kernel-Matrix Determinant Estimates from stopped Cholesky Decomposition

Authors: Simon Bartels, Wouter Boomsma, Jes Frellsen, Damien Garreau

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that this can save a considerable amount of time while rarely exceeding an overhead of more than 5% when not stopping early. More generally, we present a probabilistic stopping strategy for the approximation of a sum of known length where addends are revealed sequentially. We do not assume independence between addends, only that they are bounded from below and decrease in conditional expectation.
Researcher Affiliation	Academia	Simon Bartels EMAIL University of Copenhagen Universitetsparken 1 2100 København, Denmark Wouter Boomsma EMAIL University of Copenhagen Universitetsparken 1 2100 København, Denmark Jes Frellsen EMAIL Technical University of Denmark Richard Petersens Plads 2800 Kgs. Lyngby, Denmark Damien Garreau EMAIL Universit e Cˆote d Azur, Inria, CNRS, LJAD Parc Valrose 06108 Nice Cedex 2, France
Pseudocode	Yes	Algorithm 1 Augmented row-wise Cholesky decomposition with optional stopping. Highlighted are our modiﬁcations to the original algorithm. See Algorithm 3 on page 19 for a practical implementation. ... Algorithm 2 Evaluate Conditions And Estimator. At a given step, this routine computes the lower and upper bounds, and proceeds to check if they are close enough. ... Algorithm 3 Blocked and recursive formulation of Algorithm 1. Highlighted are our modiﬁcations to the original algorithm.
Open Source Code	Yes	To make this idea practical, we modiﬁed the Open BLAS (Wang et al., 2013) implementation and made our code1 available. 1. https://github.com/Simon Bartels/pac_kernel_matrix_determinant_estimation
Open Datasets	Yes	From the UCI machine learning repository (Dua and Graﬀ, 2019), we took all multivariate datasets in matrix format with 40000 to 50000 instances without missing values. Furthermore, we included the frequently used PUMADYN dataset (Snelson and Ghahramani, 2006) as a small-scale example of only 8000 instances. ... Table 1 provides an overview of all the datasets that we use. ... The URL is a suﬃx for http://archive.ics.uci.edu/ml/datasets/. The reference in Source acknowledges a citation request, if any.
Dataset Splits	No	The paper mentions 'We repeated each conﬁguration for ten random permutations of the dataset.' However, it does not provide specific percentages or absolute counts for training, validation, or test splits. The 'random permutations' refer to the order of processing data points for evaluation, not distinct data partitions for model development.
Hardware Specification	Yes	All large-scale experiments ( 40000 datapoints) were executed on machines running Ubuntu 18.04 with 32 Gigabytes of RAM and two Intel Xeon E5-2670 v2 CPUs. The experiments for the PUMADYN dataset were run on a laptop running Ubuntu 20.04 with 16 Gigabytes of RAM and an Intel i7-8665U CPU, to demonstrate the usefulness of our approach on more standard hardware.
Software Dependencies	No	The paper mentions 'Open BLAS' and 'Sci Py method minimize scalar', but does not provide specific version numbers for these software components. It only mentions the operating system versions 'Ubuntu 18.04' and 'Ubuntu 20.04'.
Experiment Setup	Yes	We set σ2 := 0.001 and θ := 1, and increased the lengthscale as ℓ:= exp(i) for i = 1, . . . , 3. In the appendix we report results where we set ℓ:= 1 and θ := 1, and vary the observation noise as σ2 := exp(i) for i = 4, . . . , 0. The Cholesky decomposition with full pivoting takes as input parameter a desired relative error on the diagonal elements d (instead of a relative error on the log-determinant, see Eq. (15)). We ran this algorithm for d {0.001, 0.005, 0.01, 0.05, 0.1, 0.5}. After the pivoted Cholesky stopped, we computed the relative error on the log-determinant that this algorithm could guarantee in that step. Then we ran our algorithm trying to achieve the same error for δ = 0.1.