Determinantal Point Processes for Coresets
Authors: Nicolas Tremblay, Simon Barthelmé, Pierre-Olivier Amblard
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our results to both the k-means and the linear regression problems, and give extensive empirical evidence that the small additional computational cost of DPP sampling comes with superior performance over its iid counterpart. |
| Researcher Affiliation | Academia | CNRS, Univ. Grenoble Alpes, Grenoble INP, GIPSA-lab, Grenoble, France |
| Pseudocode | Yes | Algorithm 1 The Gaussian kernel coreset sampling heuristics Algorithm 2 The Vandermonde-based coreset sampling heuristics Algorithm 3 Efficient J-DPP sampling algorithm with projective L-ensemble P = WW |
| Open Source Code | Yes | Finally, a Julia toolbox called DPP4Coresets is available on the authors website.1 The DPP4Coresets toolbox is also available at https://gricad-gitlab.univ-grenoble-alpes.fr/ tremblan/dpp4coresets.jl . |
| Open Datasets | Yes | The MNIST data set (Le Cun, 1998) We also perform experiments on the 1990 US Census data set7 (downloaded from https://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)). |
| Dataset Splits | No | The paper uses generated data (Stochastic Block Model graphs) or existing datasets (MNIST, US Census) for experiments to evaluate coreset properties and k-means performance, but does not provide specific training/test/validation splits for model training or evaluation. For example, for MNIST, it describes classifying digits but doesn't mention how the dataset was partitioned into train/test sets for this classification task. |
| Hardware Specification | Yes | Experiments were made on a laptop with 8 cores and 16 GB of memory, with the Julia toolbox available on the authors s website.8 |
| Software Dependencies | No | The paper mentions a "Julia toolbox called DPP4Coresets" and "DPP.jl" as well as "Python toolbox DPPy". While these are specific software tools, no version numbers for Julia, Python, or the toolboxes themselves are provided, which is necessary for reproducibility. |
| Experiment Setup | Yes | To measure the performance of each method, we will empirically estimate the probability that, given the method s sampled weighted subset, it verifies the coreset property of Eq. (4) for a given randomly chosen θ (setting ϵ to 0.1). For m-DPP, several values of τ were tried, and we show here the result obtained for τ = 1.5. Also, a number r = 200 of Fourier features were used. For m-DPP, τ was set to 70 (the mean interdistance estimated on 1000 randomly chosen pairs of datapoints), and a number r = 30 of Fourier features was chosen. |