reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Kernel-based Test of Independence for Cluster-correlated Data

Authors: Hongjiao Liu, Anna Plantinga, Yunhua Xiang, Michael Wu

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on both simulation studies and real data analysis, we show that, with clustered data, our approach effectively controls type I error and has a higher statistical power than competing methods.
Researcher Affiliation	Academia	Hongjiao Liu Department of Biostatistics University of Washington EMAIL Anna M. Plantinga Department of Mathematics and Statistics Williams College EMAIL Yunhua Xiang Department of Biostatistics University of Washington EMAIL Michael C. Wu Public Health Sciences Division Fred Hutchinson Cancer Research Center EMAIL
Pseudocode	No	The paper does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	All of our codes are implemented in R, and are available at https://github.com/Liujiao92/HSICcl.
Open Datasets	Yes	Here we apply HSICcl and competing methods to test the dependence between the overall vaginal microbiome composition and different metabolic pathways, using data from the Menopause Strategies: Finding Lasting Answers for Symptoms and Health (Ms FLASH) Vaginal Health Trial [27].
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits. It mentions using m clusters and d time points for simulations and real data analysis, but no partitioning for model training/validation/testing.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper states 'All of our codes are implemented in R' but does not specify the version of R or any specific R libraries with their version numbers.
Experiment Setup	Yes	For both X and Y , we consider two different kernels: the Gaussian kernel k X(z1, z2) = k Y (z1, z2) = exp( z1 z2 2 2/τ) and the linear kernel k X(z1, z2) = k Y (z1, z2) = z T 1 z2. For the Gaussian kernel, the shape parameter τ is chosen as the median of the Euclidean distance between each sample pair.