Double Generative Adversarial Networks for Conditional Independence Testing
Authors: Chengchun Shi, Tianlin Xu, Wicher Bergsma, Lexin Li
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our test through both simulations and an application to an anti-cancer drug dataset. |
| Researcher Affiliation | Academia | Chengchun Shi EMAIL Tianlin Xu EMAIL Wicher Bergsma EMAIL Department of Statistics, London School of Economics and Political Science Lexin Li EMAIL Department of Biostatistics and Epidemiology, University of California at Berkeley |
| Pseudocode | Yes | Algorithm 1 Algorithm for computing the test statistic. Algorithm 2 Algorithm for computing the p-value. |
| Open Source Code | Yes | A Python implementation of the proposed procedure is available at https://github.com/ tianlinxu312/dgcit. |
| Open Datasets | Yes | We illustrate our proposed test with an anti-cancer drug dataset from the Cancer Cell Line Encyclopedia (Barretina et al., 2012). |
| Dataset Splits | Yes | To help reduce the type-I error, we further employ a data splitting and cross-fitting strategy... We begin by dividing the data into L folds of equal size... For the number of pseudo samples M, and the number of sample splittings L, we find the results are not overly sensitive to their choices, and thus we fix M = 100 and L = 3. |
| Hardware Specification | Yes | All experiments were run on a 16 N1 CPUs Google Cloud Computing platform. |
| Software Dependencies | No | The paper mentions 'Python implementation' and the use of 'GANs' and 'neural networks', but it does not specify any version numbers for Python or any specific libraries/frameworks used for GANs or neural networks. |
| Experiment Setup | Yes | For the number of functions B in Algorithm 2, it represents a trade-off... we fix B = 30. For the number of pseudo samples M, and the number of sample splittings L, we find the results are not overly sensitive to their choices, and thus we fix M = 100 and L = 3. Besides, we set the number of bootstrap samples J = 1000. For the GANs, we use a single-hidden layer neural network to approximate both the discriminator and the generator. The number of nodes in the hidden layer is set at 128. The dimension of the input noise v(m) i,X and v(m) i,Y is set at 10. |