Multiple-Splitting Projection Test for High-Dimensional Mean Vectors
Authors: Wanjun Liu, Xiufan Yu, Runze Li
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical studies show that the proposed test well retains the type I error rate and is more powerful than state-of-the-art tests. Keywords: Exchangeable p-values; High-dimensional mean tests; Multiple data-splitting; Optimal projection direction; Regularized quadratic optimization. 1. Introduction... In this section, we conduct numerical studies to demonstrate the finite-sample performance of the proposed MPT through both Monte Carlo simulation and a real data example. |
| Researcher Affiliation | Collaboration | Wanjun Liu EMAIL Linked In Corporation Sunnyvale, CA 94085, USA; Xiufan Yu EMAIL Department of Applied and Computational Mathematics and Statistics University of Notre Dame Notre Dame, IN 46556, USA; Runze Li EMAIL Department of Statistics Pennsylvania State University University Park, PA 16802, USA |
| Pseudocode | Yes | Algorithm 1 Multiple-splitting Projection Test (MPT) 1: Input: dataset D, the number of splits m, n1, and significance level α 2: Step 1: randomly generate m permutations of t1, . . . , nu, denoted by πk, k 1, . . . , m 3: Step 2: obtain multiple p-values 4: for k 1 to m do 5: (1) partition the permuted sample Dπk into Dπk 1 and Dπk 2 and obtain xk 1, pΣ k 1 from Dπk 1 6: (2) estimate pwk using a stationary point of minimize w 1 2w J pΣ k 1w xk J 1 w Pλpwq 7: (3) project Dπk 2 and obtain yk i pwk Jxπkpiq, i n1 1, . . . , n 8: (4) T pwk ?n2syk{sk y, where syk and psk yq2 are the sample mean and variance of tyk n1 1, . . . , yk nu 9: (5) compute the p-values by pk 2 p1 Φp|T pwk|qq 10: end for 11: Step 3: combine the p-values 12: (1) compute the sample mean s Z and variance s2 Z of t Zk Φ 1ppkq, k 1, . . . , mu 13: (2) compute test statistic Mpρ s Z{ a p1 pm 1qpρq{m 14: Return: Reject H0 at level α if |Mpρ| ą cpm, α{2q |
| Open Source Code | No | The paper does not provide an explicit statement or link to any open-source code for the methodology described. |
| Open Datasets | Yes | We apply the proposed MPT and SPT together with other tests introduced above to a real dataset of high resolution micro-computed tomography (Percival et al., 2014). This dataset contains skull bone densities of n 29 mice with genotype T0A1 in a genetic mutation study. |
| Dataset Splits | Yes | Let D tx1, . . . , xnu denote the set of full sample and we partition the full sample into two disjoint sets D1 tx1, . . . , xn1u and D2 txn1 1, . . . , xnu with |D1| n1 and |D2| n2 n n1. The idea is to use D1 to estimate the optimal projection direction while use D2 to conduct the test with projected sample... We set κ 0.5 when implementing the SPT and the MPT, i.e., half of the sample is used to estimate the projection direction and the other half is used to perform the test. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions 'Monte Carlo simulation' and 'real data example' without hardware specifications. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. It mentions implementing tests but does not provide details on the software environment. |
| Experiment Setup | Yes | We generate a random sample of size n from Nppcµ, Σq with µ p1J 10, 0J p 10q J. We set c 0, 0.5 to examine the size and the power of these tests, respectively. To examine the test robustness to non-normally distributed data, we also generate random samples from a multivariate t6-distribution. Let σij be the pi, jq entry in Σ. For r P p0, 1q, we consider the following two covariance matrices: (1) compound symmetry (CS) with σij r if i j and σij 1 if i j and (2) autocorrelation (AR) with σij r|i j|. We vary r from 0.1 to 0.9 with step size 0.1 to examine the impact of correlation on size and power. We set sample size n 40, 100 and dimension p 1000... We set κ 0.5 when implementing the SPT and the MPT... The quantile approach pρ2 is used to estimate pairwise correlation among Zk s. We set the type I error rate α 0.05. All simulation results are based on 10,000 independent replications. |