On Sparsity and Sub-Gaussianity in the Johnson- Lindenstrauss Lemma

Authors: Aurélien Garivier, Emmanuel Pilliat

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a simple proof of the Johnson-Lindenstrauss lemma for sub-Gaussian variables. We extend the analysis to identify how sparse projections can be, and what the cost of sparsity is on the target dimension. ... In this paper, we propose a simple and elementary analysis of random projections under classical assumptions that emphasizes the key role of sub Gaussianity. Furthermore, we show how to extend it to sparse projections, emphasizing the limits induced by the sparsity of the data itself. ... Our goal is to make the analysis of sparse random projections accessible to a wider audience, and the main contribution of this paper is two-fold. The first purpose is to highlight that sub-Gaussianity is in fact an elementary property of random matrix entries that suffices to ensure the success of random projections. ... The second purpose of this paper is to build on this analysis to clearly emphasize the conditions on the data under which much sparser random projection matrices can be considered.
Researcher Affiliation Academia Aurélien Garivier EMAIL ENS de Lyon UMPA UMR 5669 / LIP UMR 5668 46 allée d Italie F-69364 Lyon cedex 07 and Emmanuel Pilliat EMAIL ENSAI Rennes
Pseudocode No The paper contains mathematical derivations and proofs but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing code, nor does it provide links to any code repositories.
Open Datasets No Each data point xi R10000 has independent Gaussian entries. ... Each data point xi R10000 has exactly s non-zero components, which are independent Gaussian entries. ... Each data point of the n = 100 data point xi R10000 has independent coefficients that are non-zero with probability s/p; the non-zero coefficients are independent and uniformly distributed on { -1, +1}. This indicates synthetic data generation, not the use of publicly available datasets.
Dataset Splits No The paper describes generating synthetic data for its figures and theoretical analysis, so there is no mention of dataset splits like training, validation, or test sets.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running simulations or computations.
Software Dependencies No The paper does not mention any specific software or library names with version numbers that would be needed to replicate the work.
Experiment Setup No The paper discusses parameters like 'target dimension is d = 500' and 'sparsity parameter q' in the context of theoretical analysis and simulations, but these are not hyperparameters or system-level training settings for a machine learning experiment.