reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Double Generative Adversarial Networks for Conditional Independence Testing

Authors: Chengchun Shi, Tianlin Xu, Wicher Bergsma, Lexin Li

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the eﬃcacy of our test through both simulations and an application to an anti-cancer drug dataset.
Researcher Affiliation	Academia	Chengchun Shi EMAIL Tianlin Xu EMAIL Wicher Bergsma EMAIL Department of Statistics, London School of Economics and Political Science Lexin Li EMAIL Department of Biostatistics and Epidemiology, University of California at Berkeley
Pseudocode	Yes	Algorithm 1 Algorithm for computing the test statistic. Algorithm 2 Algorithm for computing the p-value.
Open Source Code	Yes	A Python implementation of the proposed procedure is available at https://github.com/ tianlinxu312/dgcit.
Open Datasets	Yes	We illustrate our proposed test with an anti-cancer drug dataset from the Cancer Cell Line Encyclopedia (Barretina et al., 2012).
Dataset Splits	Yes	To help reduce the type-I error, we further employ a data splitting and cross-ﬁtting strategy... We begin by dividing the data into L folds of equal size... For the number of pseudo samples M, and the number of sample splittings L, we ﬁnd the results are not overly sensitive to their choices, and thus we ﬁx M = 100 and L = 3.
Hardware Specification	Yes	All experiments were run on a 16 N1 CPUs Google Cloud Computing platform.
Software Dependencies	No	The paper mentions 'Python implementation' and the use of 'GANs' and 'neural networks', but it does not specify any version numbers for Python or any specific libraries/frameworks used for GANs or neural networks.
Experiment Setup	Yes	For the number of functions B in Algorithm 2, it represents a trade-oﬀ... we ﬁx B = 30. For the number of pseudo samples M, and the number of sample splittings L, we ﬁnd the results are not overly sensitive to their choices, and thus we ﬁx M = 100 and L = 3. Besides, we set the number of bootstrap samples J = 1000. For the GANs, we use a single-hidden layer neural network to approximate both the discriminator and the generator. The number of nodes in the hidden layer is set at 128. The dimension of the input noise v(m) i,X and v(m) i,Y is set at 10.