reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality

Authors: Yinsong Wang, Shahin Shahrampour

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide extensive numerical simulations using real-world datasets to show that TAP can provide statistically significant improvement in generalization across different domains and different neural network architectures, making use of seemingly unusable unlabeled cross-modal data. We provide detailed simulations on three real-world datasets in different domains to examine various aspects of TAP and demonstrate that the integration of TAP into a neural network can provide statistically significant improvement in generalization using the unlabeled modality. We also provide detailed ablation studies to investigate the best configuration for TAP in practice, including the choice of kernel, the choice for latent space transformation, and compatibility with CNN and Transformer-based backbone feature extractors with an additional text-image dataset.
Researcher Affiliation	Academia	Yinsong Wang EMAIL Department of Mechanical and Industrial Engineering Northeastern University Shahin Shahrampour EMAIL Department of Mechanical and Industrial Engineering Northeastern University
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes methods using mathematical formulations and descriptive text, accompanied by figures visualizing the architecture.
Open Source Code	No	The paper mentions that 'Full implementation details for all experiments in this section can be found in the Appendix for reproducibility' but does not provide concrete access to source code such as a repository link or an explicit statement of code release.
Open Datasets	Yes	Datasets: To ensure a comprehensive evaluation of the performance of TAP integration, we select/create three real-world cross-modal datasets in three different areas. All datasets are open-access and can be found online. A detailed dataset and pre-processing description can be found in the Appendix. Computer Vision: We start with the MNIST dataset (MNIST) (Deng, 2012). Healthcare: We use the Activity dataset (Activity) (Mohino-Herranz et al., 2019), where the Electrodermal Activity (EDA) signals are the primary modality X for predicting the subject activity. Remote Sensing: We also choose the Crop dataset (Crop) (Khosravi et al., 2018; Khosravi & Alavipanah, 2019)... We carry out the test on a fourth dataset, Memotion 7K dataset (Sharma et al., 2020).
Dataset Splits	Yes	Similar to semi-supervised learning, the motivation behind utilizing unlabeled data points is the limited availability of labeled data. So, we randomly sample 200 data points in the primary modality to serve as the training data for each dataset. We further randomly sample 1000 data points in the secondary modality to serve as the cross-modal reference data. For MNIST, it means 200 upper half images as primary modality training data and 1000 lower half images as reference data. All the remaining data points will be the evaluation data. The memotion 7K dataset requires more training data to learn a model. So, we randomly sample 5000 data as the training set, 1000 data as the reference data, and the rest of data points are the evaluation data. At each Monte-Carlo Simulation, the set of training data, reference data, and evaluation data are shuffled while keeping the amount the same.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It mentions memory cost associated with the model but not the underlying hardware.
Software Dependencies	No	The paper mentions the use of 'Py Torch' but does not specify a version number for this or any other software dependency. It also refers to pre-trained models like Efficient Net-B0 and distilled-RoBERTa but not their specific software versions.
Experiment Setup	Yes	In the performance evaluation, the reference batch size for TAP is chosen as 250, which is 1.25 times the training data. The training data batch size is set to 100... The backbone neural network structure for all three datasets is a two-hidden-layer neural network with 64 hidden neurons at each layer. The activation function is Re LU with a dropout rate of 0.5. Layer normalization is implemented after each hidden layer... All models are trained with the cross-entropy loss using the Adam optimizer with a fixed learning rate of 0.0001. All models are trained for 1000 epochs (8000 in the Crop dataset) except for TAP... The learning rate is set to 0.00002 for both baseline and TAP integration. Each model is trained for 20 epochs, where we observe the validation accuracy stabilizes.