reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Monitoring Primitive Interactions During the Training of DNNs

Authors: Jie Ren, Xinhao Zheng, Jiyu Liu, Andrew Lizarraga, Ying Nian Wu, Liang Lin, Quanshi Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted two experiments. In the first experiment, we verified that the used top-ranked feature components represented most signals in f. For each DNN, we fed an input sample x to the DNNs trained after K different epochs, and extracted K feature vectors f (1), . . . , f (K) from these DNNs. Using the feature vectors collected from different samples at K different epochs, we conducted PCA to compute eigenvalues in Figure 2. We found that in most DNNs, the top 10 eigenvalues were significantly larger than the rest. The long-tail components with very tiny eigenvalues did not reflect essential signals for the task. Therefore, we set r = 10 in all experiments. In the second experiment, we compared the classification accuracy of using the entire feature f with the classification accuracy of using the top 10 components of the feature f + P10 i=1 fi. To this end, we masked other feature components in ϵ to obtain f = P10 i=1 fi + f, according to Eq. (3), and fed f back to the network for inference. We conducted experiments on four datasets, including the census, commercial, MNIST, and CIFAR-10 datasets. For each dataset, we randomly sample 100 samples and evaluate the classification accuracy of the network based on the original feature f. Table 1 shows that using the top 10 feature components did not significantly change the classification accuracy.
Researcher Affiliation	Academia	Jie Ren1, Xinhao Zheng1, Jiyu Liu1,2*, Andrew Lizarraga3, Ying Nian Wu3, Liang Lin4, Quanshi Zhang1 1Shanghai Jiao Tong University 2Dartmouth College 3University of California, Los Angeles 4Sun Yat-Sen University
Pseudocode	No	The paper describes methods and definitions in mathematical and textual forms but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that source code for the methodology described is publicly available, nor does it provide any links to a code repository.
Open Datasets	Yes	We trained a 5-layer MLP (Ren et al. 2023b) (namely MLP-5) and an 8-layer MLP (Ren et al. 2023b) (namely MLP-8) on three datasets (Dua and Graff 2017), including the census income (namely income), TV News channel commercial detection (namely TV news), and bike sharing (namely bike) datasets. We also followed (Li and Zhang 2023) to train a CNN and a three-layer unidirectional LSTM on the SST-2 dataset (Socher et al. 2013). Besides, we trained VGG-11 (Simonyan and Zisserman 2014) and Res Net-18/20 (He et al. 2016) (namely RN-18/20) on the MNIST (Le Cun et al. 1998), CIFAR10 (Krizhevsky 2012), and Tiny Image Net (Le and Yang 2015) datasets, and trained Point Net (Charles et al. 2017) on the Shape Net (Yi et al. 2016) dataset.
Dataset Splits	No	The paper mentions using various datasets (income, TV news, bike, SST-2, MNIST, CIFAR-10, Tiny Image Net, Shape Net) and states that experiments were conducted on them, but it does not provide specific details about how these datasets were split into training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions training various models like MLPs, CNNs, LSTMs, VGG-11, ResNet-18/20, and PointNet, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper mentions training specific neural network architectures (e.g., 5-layer MLP, 8-layer MLP, VGG-11, ResNet-18/20) on various datasets. It also states that for the principal component analysis, 'we set r = 10 in all experiments' and that they extracted features from 'the (roughly) half depth' of the DNN. However, it lacks concrete details regarding hyperparameters such as learning rates, batch sizes, number of epochs, optimizers, or other system-level training configurations typically found in an experimental setup section.