reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable Guarantees for Sparsity Recovery with Deterministic Missing Data Patterns

Authors: Chuyang Ke, Jean Honorio

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate the proposed Algorithm 1 through synthetic experiments. Experiment 1: We test four imputation strategies in the task of censored sparsity recovery, including our method, imputation by zero, imputation by mean, and imputation by median. We generate wú such that features in the support are randomly drawn in [ 1, 0.25] ﬁ[0.25, 1]. X is generated from Gaussian distribution, with mean 0 and covariance Σ. We set the diagonal of Σ to 1, and the oﬀ-diagonal to 0.8. We control the number of samples n = 1000, number of features p = 50, and size of support s = 10. The variable is the percentage of missing entries in observed ˆX. We plot the probability of censored sparsity recovery P and the lŒ distance Î ˆw wúÎŒ between the recovered and the true vector, against the percentage of missing entries, in Figure 1a and Figure 1b, respectively. Each trial is run 100 times.
Researcher Affiliation	Academia	Chuyang Ke EMAIL Computer Science Purdue University Jean Honorio EMAIL School of Computing and Information Systems The University of Melbourne
Pseudocode	Yes	Algorithm 1 Censored Sparsity Recovery Input: Observed dataset (XM, y), regularization parameter Output: Imputed feature matrix ˆX, recovered model vector w 1: Compute sample covariance matrix H from XM 2: for every feature pair (i, j) œ [p] [p] do 3: Compute empirical neighbor score ˆ i,j = H2 i,j/Hj,j 4: end for 5: for every feature i œ [p] do 6: Compute top neighboring feature Π(i) = arg maxjœ[p] {i} ˆ i,j 7: Compute empirical error ratio ˆ i = Hi,Π(i)/HΠ(i),Π(i) 8: end for 9: Initialize imputation matrix X œ Rn p 10: for every entry (k, i) œ [n] [p] do 11: Xk,i Ω Xk,Π(i)ˆ i 12: end for 13: Compute imputed matrix ˆX = M X + M c X 14: Solve the following Lasso program ˆw = minimize w l(w) + ÎwÎ1 (2) where l(w) = 1
Open Source Code	No	The paper does not explicitly provide a link to source code, nor does it state that the code will be made available in supplementary materials or upon publication. There is a link to Open Review, which is a peer review platform, not a code repository.
Open Datasets	No	The paper uses synthetic data for its experiments, stating, "We generate wú such that features in the support are randomly drawn in [ 1, 0.25] ﬁ[0.25, 1]. X is generated from Gaussian distribution, with mean 0 and covariance Σ." It does not provide access information for any publicly available or open dataset.
Dataset Splits	No	The paper uses synthetic data, generating X from a Gaussian distribution. It defines parameters like n=1000, p=50, and s=10. However, it does not specify any training/test/validation dataset splits, as the data is generated for each experiment rather than partitioned from a larger fixed dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or details about the computing environment used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., Python 3.8, PyTorch 1.9, etc.) used in the implementation of the algorithms or experiments.
Experiment Setup	Yes	Experiment 1: We control the number of samples n = 1000, number of features p = 50, and size of support s = 10. The variable is the percentage of missing entries in observed ˆX... We set the diagonal of Σ to 1, and the oﬀ-diagonal to 0.8... Theorem 2: By setting the regularization parameter > 20hmax gmax/ ... Experiment 3: We control the parameters by setting n = 200, p = 50, and s = 10.