reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Testing Conditional Mean Independence Using Generative Neural Networks

Authors: Yi Zhang, Linjun Huang, Yun Yang, Xiaofeng Shao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach demonstrates strong empirical performance in scenarios with high-dimensional covariates and response variable, can handle multivariate responses, and maintains nontrivial power against local alternatives outside an n 1/2 neighborhood of the null hypothesis. We also use numerical simulations and real-world imaging data applications to highlight the efficacy and versatility of our testing procedure.
Researcher Affiliation	Academia	1Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA. 2Department of Mathematics, University of Maryland, College Park, MD, USA. 3Department of Statistics and Data Science, and Department of Economics, Washington University in St Louis, St Louis, MO, USA.
Pseudocode	Yes	Algorithm 1 Training Conditional Generator b G
Open Source Code	Yes	A Python implementation of our proposed test procedure is available at https://github.com/Linjun Huang86749/Testing-CMI-Using-Generative-NN.
Open Datasets	Yes	We examine whether covering specific facial regions affects facial expression prediction accuracy using the FER2013 dataset (Goodfellow et al., 2013)... we investigate the impact of covering specific facial regions on age prediction accuracy using the cropped and aligned UTKFace dataset (Zhang et al., 2017), available at https://www.kaggle.com/datasets/abhikjha/utk-face-cropped.
Dataset Splits	Yes	To compute the test accuracy, we adopt the following train-test split procedure: select half of the images from each emotion label (recall that there are 7 emotions) as the testing set (with a sample size of 5850), and use the remaining images as the training set (with a sample size of 5850).
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU or CPU models) used for running its experiments. It mentions using VGG network and Efficient Net B0 model but not the hardware specifications for training them.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'leaky ReLU activation function', but it does not specify version numbers for these or for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	To train the DNN bg Y , we use the model structure and hyperparameters outlined in the following table with the MSE loss. To train the GMMN b G, we follow Algorithm 1 with loss LX = Ll X + Lg X... The table below shows how the above mentioned hyperparameters are selected in each section. Other hyperparameters of the GMMN used only in some specific section is discussed in Appendix A.1-A.3.