Provable Guarantees for Sparsity Recovery with Deterministic Missing Data Patterns
Authors: Chuyang Ke, Jean Honorio
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate the proposed Algorithm 1 through synthetic experiments. Experiment 1: We test four imputation strategies in the task of censored sparsity recovery, including our method, imputation by zero, imputation by mean, and imputation by median. We generate wú such that features in the support are randomly drawn in [ 1, 0.25] fi[0.25, 1]. X is generated from Gaussian distribution, with mean 0 and covariance Σ. We set the diagonal of Σ to 1, and the off-diagonal to 0.8. We control the number of samples n = 1000, number of features p = 50, and size of support s = 10. The variable is the percentage of missing entries in observed ˆX. We plot the probability of censored sparsity recovery P and the lŒ distance Î ˆw wúÎŒ between the recovered and the true vector, against the percentage of missing entries, in Figure 1a and Figure 1b, respectively. Each trial is run 100 times. |
| Researcher Affiliation | Academia | Chuyang Ke EMAIL Computer Science Purdue University Jean Honorio EMAIL School of Computing and Information Systems The University of Melbourne |
| Pseudocode | Yes | Algorithm 1 Censored Sparsity Recovery Input: Observed dataset (XM, y), regularization parameter Output: Imputed feature matrix ˆX, recovered model vector w 1: Compute sample covariance matrix H from XM 2: for every feature pair (i, j) œ [p] [p] do 3: Compute empirical neighbor score ˆ i,j = H2 i,j/Hj,j 4: end for 5: for every feature i œ [p] do 6: Compute top neighboring feature Π(i) = arg maxjœ[p] {i} ˆ i,j 7: Compute empirical error ratio ˆ i = Hi,Π(i)/HΠ(i),Π(i) 8: end for 9: Initialize imputation matrix X œ Rn p 10: for every entry (k, i) œ [n] [p] do 11: Xk,i Ω Xk,Π(i)ˆ i 12: end for 13: Compute imputed matrix ˆX = M X + M c X 14: Solve the following Lasso program ˆw = minimize w l(w) + ÎwÎ1 (2) where l(w) = 1 |
| Open Source Code | No | The paper does not explicitly provide a link to source code, nor does it state that the code will be made available in supplementary materials or upon publication. There is a link to Open Review, which is a peer review platform, not a code repository. |
| Open Datasets | No | The paper uses synthetic data for its experiments, stating, "We generate wú such that features in the support are randomly drawn in [ 1, 0.25] fi[0.25, 1]. X is generated from Gaussian distribution, with mean 0 and covariance Σ." It does not provide access information for any publicly available or open dataset. |
| Dataset Splits | No | The paper uses synthetic data, generating X from a Gaussian distribution. It defines parameters like n=1000, p=50, and s=10. However, it does not specify any training/test/validation dataset splits, as the data is generated for each experiment rather than partitioned from a larger fixed dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or details about the computing environment used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., Python 3.8, PyTorch 1.9, etc.) used in the implementation of the algorithms or experiments. |
| Experiment Setup | Yes | Experiment 1: We control the number of samples n = 1000, number of features p = 50, and size of support s = 10. The variable is the percentage of missing entries in observed ˆX... We set the diagonal of Σ to 1, and the off-diagonal to 0.8... Theorem 2: By setting the regularization parameter > 20hmax gmax/ ... Experiment 3: We control the parameters by setting n = 200, p = 50, and s = 10. |