Multiple-Splitting Projection Test for High-Dimensional Mean Vectors

Authors: Wanjun Liu, Xiufan Yu, Runze Li

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies show that the proposed test well retains the type I error rate and is more powerful than state-of-the-art tests. Keywords: Exchangeable p-values; High-dimensional mean tests; Multiple data-splitting; Optimal projection direction; Regularized quadratic optimization. 1. Introduction... In this section, we conduct numerical studies to demonstrate the finite-sample performance of the proposed MPT through both Monte Carlo simulation and a real data example.
Researcher Affiliation Collaboration Wanjun Liu EMAIL Linked In Corporation Sunnyvale, CA 94085, USA; Xiufan Yu EMAIL Department of Applied and Computational Mathematics and Statistics University of Notre Dame Notre Dame, IN 46556, USA; Runze Li EMAIL Department of Statistics Pennsylvania State University University Park, PA 16802, USA
Pseudocode Yes Algorithm 1 Multiple-splitting Projection Test (MPT) 1: Input: dataset D, the number of splits m, n1, and significance level α 2: Step 1: randomly generate m permutations of t1, . . . , nu, denoted by πk, k 1, . . . , m 3: Step 2: obtain multiple p-values 4: for k 1 to m do 5: (1) partition the permuted sample Dπk into Dπk 1 and Dπk 2 and obtain xk 1, pΣ k 1 from Dπk 1 6: (2) estimate pwk using a stationary point of minimize w 1 2w J pΣ k 1w xk J 1 w Pλpwq 7: (3) project Dπk 2 and obtain yk i pwk Jxπkpiq, i n1 1, . . . , n 8: (4) T pwk ?n2syk{sk y, where syk and psk yq2 are the sample mean and variance of tyk n1 1, . . . , yk nu 9: (5) compute the p-values by pk 2 p1 Φp|T pwk|qq 10: end for 11: Step 3: combine the p-values 12: (1) compute the sample mean s Z and variance s2 Z of t Zk Φ 1ppkq, k 1, . . . , mu 13: (2) compute test statistic Mpρ s Z{ a p1 pm 1qpρq{m 14: Return: Reject H0 at level α if |Mpρ| ą cpm, α{2q
Open Source Code No The paper does not provide an explicit statement or link to any open-source code for the methodology described.
Open Datasets Yes We apply the proposed MPT and SPT together with other tests introduced above to a real dataset of high resolution micro-computed tomography (Percival et al., 2014). This dataset contains skull bone densities of n 29 mice with genotype T0A1 in a genetic mutation study.
Dataset Splits Yes Let D tx1, . . . , xnu denote the set of full sample and we partition the full sample into two disjoint sets D1 tx1, . . . , xn1u and D2 txn1 1, . . . , xnu with |D1| n1 and |D2| n2 n n1. The idea is to use D1 to estimate the optimal projection direction while use D2 to conduct the test with projected sample... We set κ 0.5 when implementing the SPT and the MPT, i.e., half of the sample is used to estimate the projection direction and the other half is used to perform the test.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions 'Monte Carlo simulation' and 'real data example' without hardware specifications.
Software Dependencies No The paper does not list any specific software dependencies with version numbers. It mentions implementing tests but does not provide details on the software environment.
Experiment Setup Yes We generate a random sample of size n from Nppcµ, Σq with µ p1J 10, 0J p 10q J. We set c 0, 0.5 to examine the size and the power of these tests, respectively. To examine the test robustness to non-normally distributed data, we also generate random samples from a multivariate t6-distribution. Let σij be the pi, jq entry in Σ. For r P p0, 1q, we consider the following two covariance matrices: (1) compound symmetry (CS) with σij r if i j and σij 1 if i j and (2) autocorrelation (AR) with σij r|i j|. We vary r from 0.1 to 0.9 with step size 0.1 to examine the impact of correlation on size and power. We set sample size n 40, 100 and dimension p 1000... We set κ 0.5 when implementing the SPT and the MPT... The quantile approach pρ2 is used to estimate pairwise correlation among Zk s. We set the type I error rate α 0.05. All simulation results are based on 10,000 independent replications.