Double Generative Adversarial Networks for Conditional Independence Testing

Authors: Chengchun Shi, Tianlin Xu, Wicher Bergsma, Lexin Li

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our test through both simulations and an application to an anti-cancer drug dataset.
Researcher Affiliation Academia Chengchun Shi EMAIL Tianlin Xu EMAIL Wicher Bergsma EMAIL Department of Statistics, London School of Economics and Political Science Lexin Li EMAIL Department of Biostatistics and Epidemiology, University of California at Berkeley
Pseudocode Yes Algorithm 1 Algorithm for computing the test statistic. Algorithm 2 Algorithm for computing the p-value.
Open Source Code Yes A Python implementation of the proposed procedure is available at https://github.com/ tianlinxu312/dgcit.
Open Datasets Yes We illustrate our proposed test with an anti-cancer drug dataset from the Cancer Cell Line Encyclopedia (Barretina et al., 2012).
Dataset Splits Yes To help reduce the type-I error, we further employ a data splitting and cross-fitting strategy... We begin by dividing the data into L folds of equal size... For the number of pseudo samples M, and the number of sample splittings L, we find the results are not overly sensitive to their choices, and thus we fix M = 100 and L = 3.
Hardware Specification Yes All experiments were run on a 16 N1 CPUs Google Cloud Computing platform.
Software Dependencies No The paper mentions 'Python implementation' and the use of 'GANs' and 'neural networks', but it does not specify any version numbers for Python or any specific libraries/frameworks used for GANs or neural networks.
Experiment Setup Yes For the number of functions B in Algorithm 2, it represents a trade-off... we fix B = 30. For the number of pseudo samples M, and the number of sample splittings L, we find the results are not overly sensitive to their choices, and thus we fix M = 100 and L = 3. Besides, we set the number of bootstrap samples J = 1000. For the GANs, we use a single-hidden layer neural network to approximate both the discriminator and the generator. The number of nodes in the hidden layer is set at 128. The dimension of the input noise v(m) i,X and v(m) i,Y is set at 10.