DART: Distance Assisted Recursive Testing

Authors: Xuechan Li, Anthony D. Sung, Jichun Xie

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretical analysis and numerical experiments demonstrated that DART generates valid, robust, and powerful results. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance was impacted by post-transplant care.
Researcher Affiliation Academia Xuechan Li EMAIL Department of Biostatistics Duke University Durham, NC 27705, USA Anthony D. Sung EMAIL Department of Medicine Duke University Durham, NC 27705, USA Jichun Xie EMAIL Department of Biostatistics Duke University Durham, NC 27705, USA
Pseudocode Yes Algorithm 1: Transform the distance matrix into an aggregation tree. Algorithm 2: Recursive testing embedded in the tree.
Open Source Code Yes Experiment code can be found in https://github. com/jichunxie/DART_manu_support.git. We also built an R package, which can be found in https://github.com/jichunxie/DART.git.
Open Datasets No We applied DART to a clinical trial on hematopoietic stem cell transplantation (HCT)... In our data, patient fecal samples were collected before and after HCT; the fecal microbiome are sequenced by the 16S ribosomal RNA sequencing at the Memorial Sloan Kettering Cancer Center. ... The microbiome samples were collected and sequenced at Memorial Sloan Kettering Cancer Center (MSKCC) and pre-processed at Duke Cancer Institute (DCI) Bioinformatics Shared Resource (BSR).
Dataset Splits No The paper describes data pre-processing and the final dataset size (456 microbiome samples from 126 leukemia patients, 866 ASVs), but no explicit training/test/validation splits are provided for either the simulated or real-world experiments.
Hardware Specification Yes All the experiments were conducted on 2.10 GHz Intel Xeon Gold 6252 processors with 16 Gb memory at the Duke Compute Cluster. We requested 80 cores when running the simulation experiments to save time.
Software Dependencies No The data were then pre-processed by the R package, DADA2 (Callahan et al., 2016)... We also built an R package, which can be found in https://github.com/jichunxie/DART.git. ...the distance matrix is calculated among the remaining 857 non-reference ASV using the R package Phangorn (Schliep, 2011) based on the JC69 model (Jukes et al., 1969).
Experiment Setup Yes Under different nominal FDR levels α {0.05, 0.1, 0.15, 0.2}, we compared the performance of DART and its competitors... For DART, based on the tuning parameter selection criterion in Section 3.4, we set M = 3 and constructed a 3-layer aggregation tree, with distance thresholds g(2) = 0.88 and g(3) = 1.52. For the two FDRL procedures, ... we set k = 5 in our numerical experiments too. For Ada PT, we followed the instructions found at https://cran.r-project.org/web/packages/adapt MT/ vignettes/adapt_demo.html to set up its tuning parameters. ... Based on the tuning parameter selection procedure described in Section 2.3, we construct an aggregation tree with M = 3, L = 3. The set of possible threshold G is set as {4, 8, 12, 16}/ n log m log log m, with n = 456 and m = 857, and we choose g(2) = g(3) = 16/ n log m log log m = 0.21.