A determinantal point process for column subset selection
Authors: Ayoub Belhadji, Rémi Bardenet, Pierre Chainais
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments suggest that our bounds are tight, and that our algorithms have comparable performance with the double phase algorithm, often considered the practical state-of-the-art. Keywords: column subset selection, determinantal point process, volume sampling, leverage score sampling, low-rank approximation |
| Researcher Affiliation | Academia | Ayoub Belhadji EMAIL R emi Bardenet EMAIL Pierre Chainais EMAIL Universit e de Lille, UMR 9189 CRISt AL, F-59000 Lille, France CNRS, UMR 9189, F59000 Lille, France Centrale Lille, F-59000 Lille, France |
| Pseudocode | Yes | Figure 1: Pseudocode for sampling from a DPP of marginal kernel K. Figure 3: The pseudocode of the algorithm generating a matrix X with prescribed profile of k-leverage scores. Figure 18: The pseudocode of the generator of random valid eigensteps taking as input (ℓ, σ). |
| Open Source Code | Yes | Finally, we packaged all CSS algorithms in this section in a publicly available Python toolbox5. 5. http://github.com/Ayoub Belhadji/CSSPy |
| Open Datasets | Yes | Colon genomics 62 2000 (Alon et al., 1999) Leukemia genomics 72 7129 (Golub et al., 1999) Basehock text processing 1993 4862 (Li et al., 2017b) Relathe text processing 1427 4322 (Li et al., 2017b) |
| Dataset Splits | No | The paper discusses the generation of toy datasets with specific properties and samples of subsets for analysis, but does not provide standard training/validation/test splits for the real datasets used in the experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions 'Python toolbox' for its code and refers to 'DPPy toolbox' but does not specify any version numbers for Python or other key software libraries used for the experiments. |
| Experiment Setup | Yes | The number of observations is fixed to N = 100, the dimension to d = 20, and the number of selected columns to k {3, 5}. An ensemble of 50 subsets are sampled using each algorithm. We give the corresponding approximation errors yi X ˆw S 2, on Colon and Basehock respectively, for every value of k {10, 15, 20, 25, 30}. double phase with c = 10k OMP-mixcol, consists in outputting the columns S selected by OMP with the target vector z = Xc, where c N(0, Id). The second variant, called OMP-isotropic, consists in regressing z N(0, IN). |