Semi-Supervised Multimodal Classification Through Learning from Modal and Strategic Complementarities

Authors: Junchi Chen, Richong Zhang, Junfan Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies demonstrate the effectiveness of the proposed framework. Experimental analyses demonstrate the effectiveness of our method.
Researcher Affiliation Academia 1 CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China 2 School of Software, Beihang University, Beijing, China 3 Zhongguancun Laboratory, Beijing, China EMAIL, EMAIL
Pseudocode No The paper describes methods in narrative text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our codes will be released in https://github.com/cjc20000323/SSMC.
Open Datasets Yes We select three datasets to construct benchmarks for SSMC: N24News (Wang et al. 2021), UPMC-Food101 (Bossard, Guillaumin, and Van Gool 2014), Crisis MMD (Alam, Ofli, and Imran 2018).
Dataset Splits No We split the training set into 20, 50, 100 labeled pairs. This specifies the number of labeled samples used for semi-supervised learning but does not provide comprehensive train/test/validation splits for the entire datasets, including the unlabeled data or how the test sets were constructed.
Hardware Specification No The paper does not explicitly mention any specific hardware (like GPU or CPU models, or memory) used for running the experiments.
Software Dependencies No We use BERT and Vi T to encode texts and images. MSC is combined with Fix Match and Free Match. We use Rand Augment to gain augmented images and obtain augmented texts with swap and synonym strategy. We employ sentencebert to calculate similarity between augmented and original texts... We select Adam W as optimizer. The paper mentions software components but does not provide specific version numbers for any of them.
Experiment Setup Yes The batch size of labeled data Bx is set to 4 and batch size of unlabeled data Bu is set to 32. We set 5e 5 as learning rate. We use Rand Augment to gain augmented images and obtain augmented texts with swap and synonym strategy. We employ sentencebert to calculate similarity between augmented and original texts, choosing text with higher similarity as weak-augment and the other one as strong-augment. We select Adam W as optimizer. η is set to 0.95. The train epoch is set to 20. β1, β2, β3 are set to 1 for N24News and UPMC-Food101 and set to 0.6 for Crisis MMD.